[RFC PATCH] mm/huge_memory: do not add dropped split tail folios to LRU

Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH] mm/huge_memory: do not add dropped split tail folios to LRU
@ 2026-06-10 12:05 zhaoyang.huang
  2026-06-10 12:50 ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 5+ messages in thread
From: zhaoyang.huang @ 2026-06-10 12:05 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Zi Yan, Lorenzo Stoakes,
	Barry Song, Baolin Wang, Lance Yang, Liam R . Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, linux-mm, linux-kernel, Zhaoyang Huang,
	steve.kang

From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>

The kernel panics are keeping to be reported especially when the f2fs
partition get almost full. By investigation, we find that the reason is
one f2fs page got freed to buddy without being deleted from LRU and the
root cause is the race happened in [2] which is enrolled by this commit.
We solve this issue by reverting a f2fs commit 9609dd704725 ("f2fs: remove
non-uptodate folio from the page cache in move_data_block").

There are 3 race processes in this scenario, please find below for their
main activities. However, by further investigation over the code, I
think there is a common race window for the truncated folios between
split_folio_to_order and folio_isolate_lru, where the folios lost the
refcount on page cache and remains the transient one of the split
caller, under which the folio could enter free path and compete with the
isolation process. This commit would like to suggest to have the folios
beyond EOF stay out of LRU.

Truncate:
The changed code in move_data_block() lets the GC path evict the tail-end
folio from the page cache through folio_end_dropbehind().  Once
folio_unmap_invalidate() removes the folio from mapping->i_pages, the
page-cache references for all pages in the folio are dropped.  The folio
is then kept alive only by temporary external references, which allows a
later split to operate on a folio whose subpages are no longer protected
by page-cache references.

Split:
After the page-cache references are gone, split_folio_to_order() can
split the big folio into individual pages and put the resulting subpages
back on the LRU.  For tail pages beyond EOF, split removes them from the
page cache and drops their page-cache references.  A tail page can then
remain on the LRU with PG_lru set while holding only the split caller's
temporary reference.  When free_folio_and_swap_cache() drops that final
reference, the page enters the final folio_put() release path.

Isolate:
In parallel, folio_isolate_lru() can observe the same tail page with a
non-zero refcount and PG_lru set.  It clears PG_lru before taking its own
reference.  If this races with the final folio_put() from the split path,
__folio_put() sees PG_lru already cleared and skips lruvec_del_folio().
The page is then freed back to the allocator while its lru links are
still present in the LRU list.  A later LRU operation on a neighboring
page detects the stale link and reports list corruption.

[1]
[   22.486082] list_del corruption. next->prev should be fffffffec10e0ac8, but was dead000000000122. (next=fffffffec10e0a88)
[   22.486130] ------------[ cut here ]------------
[   22.486134] kernel BUG at lib/list_debug.c:67!
[   22.486141] Internal error: Oops - BUG: 00000000f2000800 [#1]  SMP
[   22.488502] Tainted: [W]=WARN, [O]=OOT_MODULE
[   22.488506] Hardware name: Spreadtrum UMS9230 1H10 SoC (DT)
[   22.488511] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   22.488517] pc : __list_del_entry_valid_or_report+0x14c/0x154
[   22.488531] lr : __list_del_entry_valid_or_report+0x14c/0x154
[   22.488539] sp : ffffffc08006b830
[   22.488542] x29: ffffffc08006b868 x28: 0000000000003020 x27: 0000000000000000
[   22.488553] x26: 0000000000000000 x25: 0000000000000004 x24: fffffffec10e0ac0
[   22.488564] x23: 00000000000000e8 x22: 0000000000000024 x21: dead000000000122
[   22.488574] x20: fffffffec10e0a88 x19: fffffffec10e0ac8 x18: ffffffc080061060
[   22.488585] x17: 20747562202c3863 x16: 6130653031636566 x15: 0000000000000058
[   22.488595] x14: 0000000000000004 x13: ffffff80f91e0000 x12: 0000000000000003
[   22.488605] x11: 0000000000000003 x10: 0000000000000001 x9 : ffe85721f0e25f00
[   22.488615] x8 : ffe85721f0e25f00 x7 : 0000000000000000 x6 : 6c65645f7473696c
[   22.488625] x5 : ffffffed39b23026 x4 : 0000000000000000 x3 : 0000000000000010
[   22.488636] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 000000000000006d
[   22.488647] Call trace:
[   22.488651]  __list_del_entry_valid_or_report+0x14c/0x154 (P)
[   22.488661]  __folio_put+0x2bc/0x434
[   22.488670]  folio_put+0x28/0x58
[   22.488678]  do_garbage_collect+0x1a34/0x2584
[   22.488689]  f2fs_gc+0x230/0x9b4
[   22.488697]  f2fs_fallocate+0xb90/0xdf4
[   22.488706]  vfs_fallocate+0x1b4/0x2bc
[   22.488716]  __arm64_sys_fallocate+0x44/0x78
[   22.488725]  invoke_syscall+0x58/0xe4
[   22.488732]  do_el0_svc+0x48/0xdc
[   22.488739]  el0_svc+0x3c/0x98
[   22.488747]  el0t_64_sync_handler+0x20/0x130
[   22.488754]  el0t_64_sync+0x1c4/0x1c8

[2]
CPU0 (f2fs GC)              CPU1 (split_folio_to_order)          CPU2 (folio_isolate_lru)

F: pagecache refs = n
F: extra refs = GC + split
F: PG_lru set
move_data_block()
folio = f2fs_grab_cache_folio(F)
...
__folio_set_dropbehind(F)
folio_unlock(F)
folio_end_dropbehind(F)
  folio_unmap_invalidate(F)
    __filemap_remove_folio(F)
    folio_put_refs(F, n)
folio_put(F)
                            split_folio_to_order(F)
                              folio_ref_freeze(F, 1)
                              ...
                              lru_add_split_folio(T)
                                list_add_tail(&T->lru, &F->lru)
                                folio_set_lru(T)
                              __filemap_remove_folio(T)
                              folio_put_refs(T, 1)
                              /* T refcount == 1, PageLRU set */
                            free_folio_and_swap_cache(T)
                              folio_put(T)
                                /* refcount: 1 -> 0 */
                                                                  folio_isolate_lru(T)
                                                                    folio_test_clear_lru(T)
                                __folio_put(T)
                                  __page_cache_release(T)
                                    folio_test_lru(T) == false
                                    /* skip lruvec_del_folio(T) */
                                  free_frozen_pages(T)
                                                                  folio_get(T)
                                                                  lruvec_del_folio(T)
later:
  list_del(adjacent->lru)
    next == &T->lru
    next->prev == LIST_POISON / PCP freelist
    BUG

Assisted-by: Cursor:claude-opus-4-8
Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
---
 mm/huge_memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 970e077019b7..7465525a94a8 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3966,7 +3966,7 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
 			folio_ref_unfreeze(new_folio,
 					   folio_cache_ref_count(new_folio) + 1);

-			if (do_lru)
+			if (do_lru && !(mapping && new_folio->index >= end))
 				lru_add_split_folio(folio, new_folio, lruvec, list);

 			/*
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] mm/huge_memory: do not add dropped split tail folios to LRU
  2026-06-10 12:05 [RFC PATCH] mm/huge_memory: do not add dropped split tail folios to LRU zhaoyang.huang
@ 2026-06-10 12:50 ` David Hildenbrand (Arm)
  2026-06-10 14:38   ` Zi Yan
  0 siblings, 1 reply; 5+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-10 12:50 UTC (permalink / raw)
  To: zhaoyang.huang, Andrew Morton, Zi Yan, Lorenzo Stoakes,
	Barry Song, Baolin Wang, Lance Yang, Liam R . Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, linux-mm, linux-kernel, Zhaoyang Huang,
	steve.kang

On 6/10/26 14:05, zhaoyang.huang wrote:
> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> 
> The kernel panics are keeping to be reported especially when the f2fs
> partition get almost full. By investigation, we find that the reason is
> one f2fs page got freed to buddy without being deleted from LRU and the
> root cause is the race happened in [2] which is enrolled by this commit.
> We solve this issue by reverting a f2fs commit 9609dd704725 ("f2fs: remove
> non-uptodate folio from the page cache in move_data_block").

But I assume, that other FSes can trigger this as well? Any insights?

> 
> There are 3 race processes in this scenario, please find below for their
> main activities. However, by further investigation over the code, I
> think there is a common race window for the truncated folios between
> split_folio_to_order and folio_isolate_lru, where the folios lost the
> refcount on page cache and remains the transient one of the split
> caller, under which the folio could enter free path and compete with the
> isolation process. This commit would like to suggest to have the folios
> beyond EOF stay out of LRU.
> 
> Truncate:
> The changed code in move_data_block() lets the GC path evict the tail-end
> folio from the page cache through folio_end_dropbehind().  Once
> folio_unmap_invalidate() removes the folio from mapping->i_pages, the
> page-cache references for all pages in the folio are dropped.  The folio
> is then kept alive only by temporary external references, which allows a
> later split to operate on a folio whose subpages are no longer protected
> by page-cache references.
> 
> Split:
> After the page-cache references are gone, split_folio_to_order() can
> split the big folio into individual pages and put the resulting subpages
> back on the LRU.  For tail pages beyond EOF, split removes them from the
> page cache and drops their page-cache references.  A tail page can then
> remain on the LRU with PG_lru set while holding only the split caller's
> temporary reference.  When free_folio_and_swap_cache() drops that final
> reference, the page enters the final folio_put() release path.
> 
> Isolate:
> In parallel, folio_isolate_lru() can observe the same tail page with a
> non-zero refcount and PG_lru set.  It clears PG_lru before taking its own
> reference.  If this races with the final folio_put() from the split path,
> __folio_put() sees PG_lru already cleared and skips lruvec_del_folio().
> The page is then freed back to the allocator while its lru links are
> still present in the LRU list.  A later LRU operation on a neighboring
> page detects the stale link and reports list corruption.

Complicated mess :(

So, folio_isolate_lru() really only requires the caller to hold a folio
reference, which can happen given that we did the folio_ref_unfreeze(). It can,
for example, be triggered by memory offlining or page migration.

So we really want to not allow folio_isolate_lru() while we are still processing
the folio.

What your patch does is, simply not add folios that we will drop from the page
cache to the LRU?


You should describe here how you are fixing it: "Let's fix it by..."

> 
> [1]
> [   22.486082] list_del corruption. next->prev should be fffffffec10e0ac8, but was dead000000000122. (next=fffffffec10e0a88)
> [   22.486130] ------------[ cut here ]------------
> [   22.486134] kernel BUG at lib/list_debug.c:67!
> [   22.486141] Internal error: Oops - BUG: 00000000f2000800 [#1]  SMP
> [   22.488502] Tainted: [W]=WARN, [O]=OOT_MODULE
> [   22.488506] Hardware name: Spreadtrum UMS9230 1H10 SoC (DT)
> [   22.488511] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [   22.488517] pc : __list_del_entry_valid_or_report+0x14c/0x154
> [   22.488531] lr : __list_del_entry_valid_or_report+0x14c/0x154
> [   22.488539] sp : ffffffc08006b830
> [   22.488542] x29: ffffffc08006b868 x28: 0000000000003020 x27: 0000000000000000
> [   22.488553] x26: 0000000000000000 x25: 0000000000000004 x24: fffffffec10e0ac0
> [   22.488564] x23: 00000000000000e8 x22: 0000000000000024 x21: dead000000000122
> [   22.488574] x20: fffffffec10e0a88 x19: fffffffec10e0ac8 x18: ffffffc080061060
> [   22.488585] x17: 20747562202c3863 x16: 6130653031636566 x15: 0000000000000058
> [   22.488595] x14: 0000000000000004 x13: ffffff80f91e0000 x12: 0000000000000003
> [   22.488605] x11: 0000000000000003 x10: 0000000000000001 x9 : ffe85721f0e25f00
> [   22.488615] x8 : ffe85721f0e25f00 x7 : 0000000000000000 x6 : 6c65645f7473696c
> [   22.488625] x5 : ffffffed39b23026 x4 : 0000000000000000 x3 : 0000000000000010
> [   22.488636] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 000000000000006d
> [   22.488647] Call trace:
> [   22.488651]  __list_del_entry_valid_or_report+0x14c/0x154 (P)
> [   22.488661]  __folio_put+0x2bc/0x434
> [   22.488670]  folio_put+0x28/0x58
> [   22.488678]  do_garbage_collect+0x1a34/0x2584
> [   22.488689]  f2fs_gc+0x230/0x9b4
> [   22.488697]  f2fs_fallocate+0xb90/0xdf4
> [   22.488706]  vfs_fallocate+0x1b4/0x2bc
> [   22.488716]  __arm64_sys_fallocate+0x44/0x78
> [   22.488725]  invoke_syscall+0x58/0xe4
> [   22.488732]  do_el0_svc+0x48/0xdc
> [   22.488739]  el0_svc+0x3c/0x98
> [   22.488747]  el0t_64_sync_handler+0x20/0x130
> [   22.488754]  el0t_64_sync+0x1c4/0x1c8
> 
> [2]
> CPU0 (f2fs GC)              CPU1 (split_folio_to_order)          CPU2 (folio_isolate_lru)
> 
> F: pagecache refs = n
> F: extra refs = GC + split
> F: PG_lru set
> move_data_block()
> folio = f2fs_grab_cache_folio(F)
> ...
> __folio_set_dropbehind(F)
> folio_unlock(F)
> folio_end_dropbehind(F)
>   folio_unmap_invalidate(F)
>     __filemap_remove_folio(F)
>     folio_put_refs(F, n)
> folio_put(F)
>                             split_folio_to_order(F)
>                               folio_ref_freeze(F, 1)
>                               ...
>                               lru_add_split_folio(T)
>                                 list_add_tail(&T->lru, &F->lru)
>                                 folio_set_lru(T)
>                               __filemap_remove_folio(T)
>                               folio_put_refs(T, 1)
>                               /* T refcount == 1, PageLRU set */
>                             free_folio_and_swap_cache(T)
>                               folio_put(T)
>                                 /* refcount: 1 -> 0 */
>                                                                   folio_isolate_lru(T)
>                                                                     folio_test_clear_lru(T)
>                                 __folio_put(T)
>                                   __page_cache_release(T)
>                                     folio_test_lru(T) == false
>                                     /* skip lruvec_del_folio(T) */
>                                   free_frozen_pages(T)
>                                                                   folio_get(T)
>                                                                   lruvec_del_folio(T)
> later:
>   list_del(adjacent->lru)
>     next == &T->lru
>     next->prev == LIST_POISON / PCP freelist
>     BUG
> 
> Assisted-by: Cursor:claude-opus-4-8
> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>

I'm wondering if this has been broken the whole time, or if some rework allowed
this to trigger.

I assume the issue can be triggered for other FSes, and we want Fixes: + CC: stable?

Looking into the history, I think we always unconditionally did the
lru_add_split_folio()/lru_add_page_tail().

> ---
>  mm/huge_memory.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 970e077019b7..7465525a94a8 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3966,7 +3966,7 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
>  			folio_ref_unfreeze(new_folio,
>  					   folio_cache_ref_count(new_folio) + 1);
>  
> -			if (do_lru)
> +			if (do_lru && !(mapping && new_folio->index >= end))

It might be clearer to write this as

	do_lru && (!mapping || new_folio->index < end)

To match the page-cache check further below

	if (!mapping)
		continue

	...
	if (new_folio->index < end)
		...

>  				lru_add_split_folio(folio, new_folio, lruvec, list);
>  
>  			/*

folio_check_splittable() makes sure that we have a mapping for non-anon folios.
(no truncation). end is then only set for non-anon folios.

@Zi, any thoughts?

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] mm/huge_memory: do not add dropped split tail folios to LRU
  2026-06-10 12:50 ` David Hildenbrand (Arm)
@ 2026-06-10 14:38   ` Zi Yan
  2026-06-10 17:25     ` Zi Yan
  0 siblings, 1 reply; 5+ messages in thread
From: Zi Yan @ 2026-06-10 14:38 UTC (permalink / raw)
  To: David Hildenbrand (Arm), zhaoyang.huang
  Cc: Andrew Morton, Lorenzo Stoakes, Barry Song, Baolin Wang,
	Lance Yang, Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain,
	linux-mm, linux-kernel, Zhaoyang Huang, steve.kang

On 10 Jun 2026, at 8:50, David Hildenbrand (Arm) wrote:

> On 6/10/26 14:05, zhaoyang.huang wrote:
>> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
>>
>> The kernel panics are keeping to be reported especially when the f2fs
>> partition get almost full. By investigation, we find that the reason is
>> one f2fs page got freed to buddy without being deleted from LRU and the
>> root cause is the race happened in [2] which is enrolled by this commit.
>> We solve this issue by reverting a f2fs commit 9609dd704725 ("f2fs: remove
>> non-uptodate folio from the page cache in move_data_block").
>
> But I assume, that other FSes can trigger this as well? Any insights?
>
>>
>> There are 3 race processes in this scenario, please find below for their
>> main activities. However, by further investigation over the code, I
>> think there is a common race window for the truncated folios between
>> split_folio_to_order and folio_isolate_lru, where the folios lost the
>> refcount on page cache and remains the transient one of the split
>> caller, under which the folio could enter free path and compete with the
>> isolation process. This commit would like to suggest to have the folios
>> beyond EOF stay out of LRU.
>>
>> Truncate:
>> The changed code in move_data_block() lets the GC path evict the tail-end
>> folio from the page cache through folio_end_dropbehind().  Once
>> folio_unmap_invalidate() removes the folio from mapping->i_pages, the
>> page-cache references for all pages in the folio are dropped.  The folio
>> is then kept alive only by temporary external references, which allows a
>> later split to operate on a folio whose subpages are no longer protected
>> by page-cache references.
>>
>> Split:
>> After the page-cache references are gone, split_folio_to_order() can
>> split the big folio into individual pages and put the resulting subpages
>> back on the LRU.  For tail pages beyond EOF, split removes them from the
>> page cache and drops their page-cache references.  A tail page can then
>> remain on the LRU with PG_lru set while holding only the split caller's
>> temporary reference.  When free_folio_and_swap_cache() drops that final
>> reference, the page enters the final folio_put() release path.
>>
>> Isolate:
>> In parallel, folio_isolate_lru() can observe the same tail page with a
>> non-zero refcount and PG_lru set.  It clears PG_lru before taking its own
>> reference.  If this races with the final folio_put() from the split path,
>> __folio_put() sees PG_lru already cleared and skips lruvec_del_folio().
>> The page is then freed back to the allocator while its lru links are
>> still present in the LRU list.  A later LRU operation on a neighboring
>> page detects the stale link and reports list corruption.
>
> Complicated mess :(
>
> So, folio_isolate_lru() really only requires the caller to hold a folio
> reference, which can happen given that we did the folio_ref_unfreeze(). It can,
> for example, be triggered by memory offlining or page migration.
>
> So we really want to not allow folio_isolate_lru() while we are still processing
> the folio.

Or we should defer adding split folios to LRU after unfreeze.

>
> What your patch does is, simply not add folios that we will drop from the page
> cache to the LRU?
>
>
> You should describe here how you are fixing it: "Let's fix it by..."
>
>>
>> [1]
>> [   22.486082] list_del corruption. next->prev should be fffffffec10e0ac8, but was dead000000000122. (next=fffffffec10e0a88)
>> [   22.486130] ------------[ cut here ]------------
>> [   22.486134] kernel BUG at lib/list_debug.c:67!
>> [   22.486141] Internal error: Oops - BUG: 00000000f2000800 [#1]  SMP
>> [   22.488502] Tainted: [W]=WARN, [O]=OOT_MODULE
>> [   22.488506] Hardware name: Spreadtrum UMS9230 1H10 SoC (DT)
>> [   22.488511] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> [   22.488517] pc : __list_del_entry_valid_or_report+0x14c/0x154
>> [   22.488531] lr : __list_del_entry_valid_or_report+0x14c/0x154
>> [   22.488539] sp : ffffffc08006b830
>> [   22.488542] x29: ffffffc08006b868 x28: 0000000000003020 x27: 0000000000000000
>> [   22.488553] x26: 0000000000000000 x25: 0000000000000004 x24: fffffffec10e0ac0
>> [   22.488564] x23: 00000000000000e8 x22: 0000000000000024 x21: dead000000000122
>> [   22.488574] x20: fffffffec10e0a88 x19: fffffffec10e0ac8 x18: ffffffc080061060
>> [   22.488585] x17: 20747562202c3863 x16: 6130653031636566 x15: 0000000000000058
>> [   22.488595] x14: 0000000000000004 x13: ffffff80f91e0000 x12: 0000000000000003
>> [   22.488605] x11: 0000000000000003 x10: 0000000000000001 x9 : ffe85721f0e25f00
>> [   22.488615] x8 : ffe85721f0e25f00 x7 : 0000000000000000 x6 : 6c65645f7473696c
>> [   22.488625] x5 : ffffffed39b23026 x4 : 0000000000000000 x3 : 0000000000000010
>> [   22.488636] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 000000000000006d
>> [   22.488647] Call trace:
>> [   22.488651]  __list_del_entry_valid_or_report+0x14c/0x154 (P)
>> [   22.488661]  __folio_put+0x2bc/0x434
>> [   22.488670]  folio_put+0x28/0x58
>> [   22.488678]  do_garbage_collect+0x1a34/0x2584
>> [   22.488689]  f2fs_gc+0x230/0x9b4
>> [   22.488697]  f2fs_fallocate+0xb90/0xdf4
>> [   22.488706]  vfs_fallocate+0x1b4/0x2bc
>> [   22.488716]  __arm64_sys_fallocate+0x44/0x78
>> [   22.488725]  invoke_syscall+0x58/0xe4
>> [   22.488732]  do_el0_svc+0x48/0xdc
>> [   22.488739]  el0_svc+0x3c/0x98
>> [   22.488747]  el0t_64_sync_handler+0x20/0x130
>> [   22.488754]  el0t_64_sync+0x1c4/0x1c8
>>
>> [2]
>> CPU0 (f2fs GC)              CPU1 (split_folio_to_order)          CPU2 (folio_isolate_lru)
>>
>> F: pagecache refs = n
>> F: extra refs = GC + split
>> F: PG_lru set
>> move_data_block()
>> folio = f2fs_grab_cache_folio(F)
>> ...
>> __folio_set_dropbehind(F)
>> folio_unlock(F)
>> folio_end_dropbehind(F)
>>   folio_unmap_invalidate(F)
>>     __filemap_remove_folio(F)
>>     folio_put_refs(F, n)
>> folio_put(F)
>>                             split_folio_to_order(F)
>>                               folio_ref_freeze(F, 1)
>>                               ...
>>                               lru_add_split_folio(T)
>>                                 list_add_tail(&T->lru, &F->lru)
>>                                 folio_set_lru(T)
>>                               __filemap_remove_folio(T)
>>                               folio_put_refs(T, 1)
>>                               /* T refcount == 1, PageLRU set */
>>                             free_folio_and_swap_cache(T)
>>                               folio_put(T)
>>                                 /* refcount: 1 -> 0 */
>>                                                                   folio_isolate_lru(T)

If refcount is 0 at this point, VM_BUG_ON_FOLIO(!folio_ref_count(folio), folio) in
folio_isolate_lru() would be triggered. Maybe we could just return false in that case.

>>                                                                     folio_test_clear_lru(T)
>>                                 __folio_put(T)
>>                                   __page_cache_release(T)
>>                                     folio_test_lru(T) == false
>>                                     /* skip lruvec_del_folio(T) */
>>                                   free_frozen_pages(T)
>>                                                                   folio_get(T)
>>                                                                   lruvec_del_folio(T)

But in CPU2 (folio_isolate_lru), lruvec_del_folio(T) should remove T from LRU list.

>> later:
>>   list_del(adjacent->lru)
>>     next == &T->lru
>>     next->prev == LIST_POISON / PCP freelist
>>     BUG
>>

Why does CPU0 still see the stale link from adjacent?

>> Assisted-by: Cursor:claude-opus-4-8
>> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
>
> I'm wondering if this has been broken the whole time, or if some rework allowed
> this to trigger.
>
> I assume the issue can be triggered for other FSes, and we want Fixes: + CC: stable?
>
> Looking into the history, I think we always unconditionally did the
> lru_add_split_folio()/lru_add_page_tail().
>
>> ---
>>  mm/huge_memory.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 970e077019b7..7465525a94a8 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -3966,7 +3966,7 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
>>  			folio_ref_unfreeze(new_folio,
>>  					   folio_cache_ref_count(new_folio) + 1);
>>
>> -			if (do_lru)
>> +			if (do_lru && !(mapping && new_folio->index >= end))
>
> It might be clearer to write this as
>
> 	do_lru && (!mapping || new_folio->index < end)
>
> To match the page-cache check further below
>
> 	if (!mapping)
> 		continue
>
> 	...
> 	if (new_folio->index < end)
> 		...
>
>>  				lru_add_split_folio(folio, new_folio, lruvec, list);
>>
>>  			/*
>
> folio_check_splittable() makes sure that we have a mapping for non-anon folios.
> (no truncation). end is then only set for non-anon folios.
>
> @Zi, any thoughts?

The fix works but I feel that it is masking the race between folio_isolate_lru() and
folio_put(). I worry that the same issue might be triggered in other ways or
in new code if we do not fix the race.

To summarize my thoughts above:
1. adding frozen folios in LRU might be problematic, since folio_isolate_lru()
has a VM_BUG_ON_FOLIO() for it but still chooses to proceed the isolation.

2. the race analysis is not clear, since both folio_isolate_lru() and folio_put()
do lruvec_del_folio() if folio is on LRU. When list_del(adjacent->lru) sees
the stale link, the folio is already in buddy and page->lru is modified for
PageBuddy use? So even without CPU0, folio_isolate_lru()'s lruvec_del_folio()
can do the wrong thing on pages on buddy?


--
Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] mm/huge_memory: do not add dropped split tail folios to LRU
  2026-06-10 14:38   ` Zi Yan
@ 2026-06-10 17:25     ` Zi Yan
  2026-06-10 18:44       ` Zi Yan
  0 siblings, 1 reply; 5+ messages in thread
From: Zi Yan @ 2026-06-10 17:25 UTC (permalink / raw)
  To: David Hildenbrand (Arm), zhaoyang.huang
  Cc: Andrew Morton, Lorenzo Stoakes, Barry Song, Baolin Wang,
	Lance Yang, Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain,
	linux-mm, linux-kernel, Zhaoyang Huang, steve.kang

On 10 Jun 2026, at 10:38, Zi Yan wrote:

> On 10 Jun 2026, at 8:50, David Hildenbrand (Arm) wrote:
>
>> On 6/10/26 14:05, zhaoyang.huang wrote:
>>> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
>>>
>>> The kernel panics are keeping to be reported especially when the f2fs
>>> partition get almost full. By investigation, we find that the reason is
>>> one f2fs page got freed to buddy without being deleted from LRU and the
>>> root cause is the race happened in [2] which is enrolled by this commit.
>>> We solve this issue by reverting a f2fs commit 9609dd704725 ("f2fs: remove
>>> non-uptodate folio from the page cache in move_data_block").
>>
>> But I assume, that other FSes can trigger this as well? Any insights?
>>
>>>
>>> There are 3 race processes in this scenario, please find below for their
>>> main activities. However, by further investigation over the code, I
>>> think there is a common race window for the truncated folios between
>>> split_folio_to_order and folio_isolate_lru, where the folios lost the
>>> refcount on page cache and remains the transient one of the split
>>> caller, under which the folio could enter free path and compete with the
>>> isolation process. This commit would like to suggest to have the folios
>>> beyond EOF stay out of LRU.
>>>
>>> Truncate:
>>> The changed code in move_data_block() lets the GC path evict the tail-end
>>> folio from the page cache through folio_end_dropbehind().  Once
>>> folio_unmap_invalidate() removes the folio from mapping->i_pages, the
>>> page-cache references for all pages in the folio are dropped.  The folio
>>> is then kept alive only by temporary external references, which allows a
>>> later split to operate on a folio whose subpages are no longer protected
>>> by page-cache references.
>>>
>>> Split:
>>> After the page-cache references are gone, split_folio_to_order() can
>>> split the big folio into individual pages and put the resulting subpages
>>> back on the LRU.  For tail pages beyond EOF, split removes them from the
>>> page cache and drops their page-cache references.  A tail page can then
>>> remain on the LRU with PG_lru set while holding only the split caller's
>>> temporary reference.  When free_folio_and_swap_cache() drops that final
>>> reference, the page enters the final folio_put() release path.
>>>
>>> Isolate:
>>> In parallel, folio_isolate_lru() can observe the same tail page with a
>>> non-zero refcount and PG_lru set.  It clears PG_lru before taking its own
>>> reference.  If this races with the final folio_put() from the split path,
>>> __folio_put() sees PG_lru already cleared and skips lruvec_del_folio().
>>> The page is then freed back to the allocator while its lru links are
>>> still present in the LRU list.  A later LRU operation on a neighboring
>>> page detects the stale link and reports list corruption.
>>
>> Complicated mess :(
>>
>> So, folio_isolate_lru() really only requires the caller to hold a folio
>> reference, which can happen given that we did the folio_ref_unfreeze(). It can,
>> for example, be triggered by memory offlining or page migration.
>>
>> So we really want to not allow folio_isolate_lru() while we are still processing
>> the folio.
>
> Or we should defer adding split folios to LRU after unfreeze.
>
>>
>> What your patch does is, simply not add folios that we will drop from the page
>> cache to the LRU?
>>
>>
>> You should describe here how you are fixing it: "Let's fix it by..."
>>
>>>
>>> [1]
>>> [   22.486082] list_del corruption. next->prev should be fffffffec10e0ac8, but was dead000000000122. (next=fffffffec10e0a88)
>>> [   22.486130] ------------[ cut here ]------------
>>> [   22.486134] kernel BUG at lib/list_debug.c:67!
>>> [   22.486141] Internal error: Oops - BUG: 00000000f2000800 [#1]  SMP
>>> [   22.488502] Tainted: [W]=WARN, [O]=OOT_MODULE
>>> [   22.488506] Hardware name: Spreadtrum UMS9230 1H10 SoC (DT)
>>> [   22.488511] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>> [   22.488517] pc : __list_del_entry_valid_or_report+0x14c/0x154
>>> [   22.488531] lr : __list_del_entry_valid_or_report+0x14c/0x154
>>> [   22.488539] sp : ffffffc08006b830
>>> [   22.488542] x29: ffffffc08006b868 x28: 0000000000003020 x27: 0000000000000000
>>> [   22.488553] x26: 0000000000000000 x25: 0000000000000004 x24: fffffffec10e0ac0
>>> [   22.488564] x23: 00000000000000e8 x22: 0000000000000024 x21: dead000000000122
>>> [   22.488574] x20: fffffffec10e0a88 x19: fffffffec10e0ac8 x18: ffffffc080061060
>>> [   22.488585] x17: 20747562202c3863 x16: 6130653031636566 x15: 0000000000000058
>>> [   22.488595] x14: 0000000000000004 x13: ffffff80f91e0000 x12: 0000000000000003
>>> [   22.488605] x11: 0000000000000003 x10: 0000000000000001 x9 : ffe85721f0e25f00
>>> [   22.488615] x8 : ffe85721f0e25f00 x7 : 0000000000000000 x6 : 6c65645f7473696c
>>> [   22.488625] x5 : ffffffed39b23026 x4 : 0000000000000000 x3 : 0000000000000010
>>> [   22.488636] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 000000000000006d
>>> [   22.488647] Call trace:
>>> [   22.488651]  __list_del_entry_valid_or_report+0x14c/0x154 (P)
>>> [   22.488661]  __folio_put+0x2bc/0x434
>>> [   22.488670]  folio_put+0x28/0x58
>>> [   22.488678]  do_garbage_collect+0x1a34/0x2584
>>> [   22.488689]  f2fs_gc+0x230/0x9b4
>>> [   22.488697]  f2fs_fallocate+0xb90/0xdf4
>>> [   22.488706]  vfs_fallocate+0x1b4/0x2bc
>>> [   22.488716]  __arm64_sys_fallocate+0x44/0x78
>>> [   22.488725]  invoke_syscall+0x58/0xe4
>>> [   22.488732]  do_el0_svc+0x48/0xdc
>>> [   22.488739]  el0_svc+0x3c/0x98
>>> [   22.488747]  el0t_64_sync_handler+0x20/0x130
>>> [   22.488754]  el0t_64_sync+0x1c4/0x1c8
>>>
>>> [2]
>>> CPU0 (f2fs GC)              CPU1 (split_folio_to_order)          CPU2 (folio_isolate_lru)
>>>
>>> F: pagecache refs = n
>>> F: extra refs = GC + split
>>> F: PG_lru set
>>> move_data_block()
>>> folio = f2fs_grab_cache_folio(F)
>>> ...
>>> __folio_set_dropbehind(F)
>>> folio_unlock(F)
>>> folio_end_dropbehind(F)
>>>   folio_unmap_invalidate(F)
>>>     __filemap_remove_folio(F)
>>>     folio_put_refs(F, n)
>>> folio_put(F)
>>>                             split_folio_to_order(F)
>>>                               folio_ref_freeze(F, 1)
>>>                               ...
>>>                               lru_add_split_folio(T)
>>>                                 list_add_tail(&T->lru, &F->lru)
>>>                                 folio_set_lru(T)
>>>                               __filemap_remove_folio(T)
>>>                               folio_put_refs(T, 1)
>>>                               /* T refcount == 1, PageLRU set */
>>>                             free_folio_and_swap_cache(T)
>>>                               folio_put(T)
>>>                                 /* refcount: 1 -> 0 */
>>>                                                                   folio_isolate_lru(T)
>
> If refcount is 0 at this point, VM_BUG_ON_FOLIO(!folio_ref_count(folio), folio) in
> folio_isolate_lru() would be triggered. Maybe we could just return false in that case.
>
>>>                                                                     folio_test_clear_lru(T)
>>>                                 __folio_put(T)
>>>                                   __page_cache_release(T)
>>>                                     folio_test_lru(T) == false
>>>                                     /* skip lruvec_del_folio(T) */
>>>                                   free_frozen_pages(T)
>>>                                                                   folio_get(T)
>>>                                                                   lruvec_del_folio(T)
>
> But in CPU2 (folio_isolate_lru), lruvec_del_folio(T) should remove T from LRU list.
>
>>> later:
>>>   list_del(adjacent->lru)
>>>     next == &T->lru
>>>     next->prev == LIST_POISON / PCP freelist
>>>     BUG
>>>
>
> Why does CPU0 still see the stale link from adjacent?
>
>>> Assisted-by: Cursor:claude-opus-4-8
>>> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
>>
>> I'm wondering if this has been broken the whole time, or if some rework allowed
>> this to trigger.
>>
>> I assume the issue can be triggered for other FSes, and we want Fixes: + CC: stable?
>>
>> Looking into the history, I think we always unconditionally did the
>> lru_add_split_folio()/lru_add_page_tail().
>>
>>> ---
>>>  mm/huge_memory.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 970e077019b7..7465525a94a8 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -3966,7 +3966,7 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
>>>  			folio_ref_unfreeze(new_folio,
>>>  					   folio_cache_ref_count(new_folio) + 1);
>>>
>>> -			if (do_lru)
>>> +			if (do_lru && !(mapping && new_folio->index >= end))
>>
>> It might be clearer to write this as
>>
>> 	do_lru && (!mapping || new_folio->index < end)
>>
>> To match the page-cache check further below
>>
>> 	if (!mapping)
>> 		continue
>>
>> 	...
>> 	if (new_folio->index < end)
>> 		...
>>
>>>  				lru_add_split_folio(folio, new_folio, lruvec, list);

Talked to Claude and find an accounting issue with this. Without putting
EOF after-split folios back to LRU, they are not going through lruvec_del_folio(),
which decreases NR_*_LRU counter along with removing the folio from LRU
and it causes NR_*_LRU accounting errors. Note that the original folio
is on LRU all the time and LRU counters are not modified and after the split
the original folio size is decreased and the after-split folios need to
be added back to LRU to keep the LRU counters right. We will need to adjust
LRU accounting for (!mapping || new_folio->index < end) if we decide to
not add them back to LRU.


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] mm/huge_memory: do not add dropped split tail folios to LRU
  2026-06-10 17:25     ` Zi Yan
@ 2026-06-10 18:44       ` Zi Yan
  0 siblings, 0 replies; 5+ messages in thread
From: Zi Yan @ 2026-06-10 18:44 UTC (permalink / raw)
  To: David Hildenbrand (Arm), zhaoyang.huang
  Cc: Andrew Morton, Lorenzo Stoakes, Barry Song, Baolin Wang,
	Lance Yang, Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain,
	linux-mm, linux-kernel, Zhaoyang Huang, steve.kang

On 10 Jun 2026, at 13:25, Zi Yan wrote:

> On 10 Jun 2026, at 10:38, Zi Yan wrote:
>
>> On 10 Jun 2026, at 8:50, David Hildenbrand (Arm) wrote:
>>
>>> On 6/10/26 14:05, zhaoyang.huang wrote:
>>>> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
>>>>
>>>> The kernel panics are keeping to be reported especially when the f2fs
>>>> partition get almost full. By investigation, we find that the reason is
>>>> one f2fs page got freed to buddy without being deleted from LRU and the
>>>> root cause is the race happened in [2] which is enrolled by this commit.
>>>> We solve this issue by reverting a f2fs commit 9609dd704725 ("f2fs: remove
>>>> non-uptodate folio from the page cache in move_data_block").
>>>
>>> But I assume, that other FSes can trigger this as well? Any insights?
>>>
>>>>
>>>> There are 3 race processes in this scenario, please find below for their
>>>> main activities. However, by further investigation over the code, I
>>>> think there is a common race window for the truncated folios between
>>>> split_folio_to_order and folio_isolate_lru, where the folios lost the
>>>> refcount on page cache and remains the transient one of the split
>>>> caller, under which the folio could enter free path and compete with the
>>>> isolation process. This commit would like to suggest to have the folios
>>>> beyond EOF stay out of LRU.
>>>>
>>>> Truncate:
>>>> The changed code in move_data_block() lets the GC path evict the tail-end
>>>> folio from the page cache through folio_end_dropbehind().  Once
>>>> folio_unmap_invalidate() removes the folio from mapping->i_pages, the
>>>> page-cache references for all pages in the folio are dropped.  The folio
>>>> is then kept alive only by temporary external references, which allows a
>>>> later split to operate on a folio whose subpages are no longer protected
>>>> by page-cache references.
>>>>
>>>> Split:
>>>> After the page-cache references are gone, split_folio_to_order() can
>>>> split the big folio into individual pages and put the resulting subpages
>>>> back on the LRU.  For tail pages beyond EOF, split removes them from the
>>>> page cache and drops their page-cache references.  A tail page can then
>>>> remain on the LRU with PG_lru set while holding only the split caller's
>>>> temporary reference.  When free_folio_and_swap_cache() drops that final
>>>> reference, the page enters the final folio_put() release path.
>>>>
>>>> Isolate:
>>>> In parallel, folio_isolate_lru() can observe the same tail page with a
>>>> non-zero refcount and PG_lru set.  It clears PG_lru before taking its own
>>>> reference.  If this races with the final folio_put() from the split path,
>>>> __folio_put() sees PG_lru already cleared and skips lruvec_del_folio().
>>>> The page is then freed back to the allocator while its lru links are
>>>> still present in the LRU list.  A later LRU operation on a neighboring
>>>> page detects the stale link and reports list corruption.

Something is wrong here with the caller of folio_isolate_lru(), since
folio_isolate_lru() requires the caller to take an elevated refcount.
This means when entering folio_isolate_lru(), the EOF folio should have
at least refcount == 2, 1 from folio_split(), 1 from the caller of
folio_isolate_lru(). This should prevent the EOF folio being freed
by the parallel __folio_put().

Hi Zhaoyang, can you elaborate on the folio_isolate_lru() caller?

In addition (with the help of Claude), the race trace[2] below
looks invalid. It says split happens after folio_end_dropbehind(),
which sets folio->mapping to NULL, but __folio_split() returns -EBUSY
when folio->mapping is NULL in filemap_release_folio() check.
So the split cannot happen.

Now I am not sure if the bug report is valid or not. At least for
folio_split() and folio_isolate_lru(), the race should not exist.
But let me know if I miss anything.

>>>
>>> Complicated mess :(
>>>
>>> So, folio_isolate_lru() really only requires the caller to hold a folio
>>> reference, which can happen given that we did the folio_ref_unfreeze(). It can,
>>> for example, be triggered by memory offlining or page migration.
>>>
>>> So we really want to not allow folio_isolate_lru() while we are still processing
>>> the folio.
>>
>> Or we should defer adding split folios to LRU after unfreeze.
>>
>>>
>>> What your patch does is, simply not add folios that we will drop from the page
>>> cache to the LRU?
>>>
>>>
>>> You should describe here how you are fixing it: "Let's fix it by..."
>>>
>>>>
>>>> [1]
>>>> [   22.486082] list_del corruption. next->prev should be fffffffec10e0ac8, but was dead000000000122. (next=fffffffec10e0a88)
>>>> [   22.486130] ------------[ cut here ]------------
>>>> [   22.486134] kernel BUG at lib/list_debug.c:67!
>>>> [   22.486141] Internal error: Oops - BUG: 00000000f2000800 [#1]  SMP
>>>> [   22.488502] Tainted: [W]=WARN, [O]=OOT_MODULE
>>>> [   22.488506] Hardware name: Spreadtrum UMS9230 1H10 SoC (DT)
>>>> [   22.488511] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>> [   22.488517] pc : __list_del_entry_valid_or_report+0x14c/0x154
>>>> [   22.488531] lr : __list_del_entry_valid_or_report+0x14c/0x154
>>>> [   22.488539] sp : ffffffc08006b830
>>>> [   22.488542] x29: ffffffc08006b868 x28: 0000000000003020 x27: 0000000000000000
>>>> [   22.488553] x26: 0000000000000000 x25: 0000000000000004 x24: fffffffec10e0ac0
>>>> [   22.488564] x23: 00000000000000e8 x22: 0000000000000024 x21: dead000000000122
>>>> [   22.488574] x20: fffffffec10e0a88 x19: fffffffec10e0ac8 x18: ffffffc080061060
>>>> [   22.488585] x17: 20747562202c3863 x16: 6130653031636566 x15: 0000000000000058
>>>> [   22.488595] x14: 0000000000000004 x13: ffffff80f91e0000 x12: 0000000000000003
>>>> [   22.488605] x11: 0000000000000003 x10: 0000000000000001 x9 : ffe85721f0e25f00
>>>> [   22.488615] x8 : ffe85721f0e25f00 x7 : 0000000000000000 x6 : 6c65645f7473696c
>>>> [   22.488625] x5 : ffffffed39b23026 x4 : 0000000000000000 x3 : 0000000000000010
>>>> [   22.488636] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 000000000000006d
>>>> [   22.488647] Call trace:
>>>> [   22.488651]  __list_del_entry_valid_or_report+0x14c/0x154 (P)
>>>> [   22.488661]  __folio_put+0x2bc/0x434
>>>> [   22.488670]  folio_put+0x28/0x58
>>>> [   22.488678]  do_garbage_collect+0x1a34/0x2584
>>>> [   22.488689]  f2fs_gc+0x230/0x9b4
>>>> [   22.488697]  f2fs_fallocate+0xb90/0xdf4
>>>> [   22.488706]  vfs_fallocate+0x1b4/0x2bc
>>>> [   22.488716]  __arm64_sys_fallocate+0x44/0x78
>>>> [   22.488725]  invoke_syscall+0x58/0xe4
>>>> [   22.488732]  do_el0_svc+0x48/0xdc
>>>> [   22.488739]  el0_svc+0x3c/0x98
>>>> [   22.488747]  el0t_64_sync_handler+0x20/0x130
>>>> [   22.488754]  el0t_64_sync+0x1c4/0x1c8
>>>>
>>>> [2]
>>>> CPU0 (f2fs GC)              CPU1 (split_folio_to_order)          CPU2 (folio_isolate_lru)
>>>>
>>>> F: pagecache refs = n
>>>> F: extra refs = GC + split
>>>> F: PG_lru set
>>>> move_data_block()
>>>> folio = f2fs_grab_cache_folio(F)
>>>> ...
>>>> __folio_set_dropbehind(F)
>>>> folio_unlock(F)
>>>> folio_end_dropbehind(F)
>>>>   folio_unmap_invalidate(F)
>>>>     __filemap_remove_folio(F)
>>>>     folio_put_refs(F, n)
>>>> folio_put(F)
>>>>                             split_folio_to_order(F)
>>>>                               folio_ref_freeze(F, 1)
>>>>                               ...
>>>>                               lru_add_split_folio(T)
>>>>                                 list_add_tail(&T->lru, &F->lru)
>>>>                                 folio_set_lru(T)
>>>>                               __filemap_remove_folio(T)
>>>>                               folio_put_refs(T, 1)
>>>>                               /* T refcount == 1, PageLRU set */
>>>>                             free_folio_and_swap_cache(T)
>>>>                               folio_put(T)
>>>>                                 /* refcount: 1 -> 0 */
>>>>                                                                   folio_isolate_lru(T)
>>
>> If refcount is 0 at this point, VM_BUG_ON_FOLIO(!folio_ref_count(folio), folio) in
>> folio_isolate_lru() would be triggered. Maybe we could just return false in that case.
>>
>>>>                                                                     folio_test_clear_lru(T)
>>>>                                 __folio_put(T)
>>>>                                   __page_cache_release(T)
>>>>                                     folio_test_lru(T) == false
>>>>                                     /* skip lruvec_del_folio(T) */
>>>>                                   free_frozen_pages(T)
>>>>                                                                   folio_get(T)
>>>>                                                                   lruvec_del_folio(T)
>>
>> But in CPU2 (folio_isolate_lru), lruvec_del_folio(T) should remove T from LRU list.
>>
>>>> later:
>>>>   list_del(adjacent->lru)
>>>>     next == &T->lru
>>>>     next->prev == LIST_POISON / PCP freelist
>>>>     BUG
>>>>
>>
>> Why does CPU0 still see the stale link from adjacent?
>>
>>>> Assisted-by: Cursor:claude-opus-4-8
>>>> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
>>>
>>> I'm wondering if this has been broken the whole time, or if some rework allowed
>>> this to trigger.
>>>
>>> I assume the issue can be triggered for other FSes, and we want Fixes: + CC: stable?
>>>
>>> Looking into the history, I think we always unconditionally did the
>>> lru_add_split_folio()/lru_add_page_tail().
>>>
>>>> ---
>>>>  mm/huge_memory.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index 970e077019b7..7465525a94a8 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -3966,7 +3966,7 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
>>>>  			folio_ref_unfreeze(new_folio,
>>>>  					   folio_cache_ref_count(new_folio) + 1);
>>>>
>>>> -			if (do_lru)
>>>> +			if (do_lru && !(mapping && new_folio->index >= end))
>>>
>>> It might be clearer to write this as
>>>
>>> 	do_lru && (!mapping || new_folio->index < end)
>>>
>>> To match the page-cache check further below
>>>
>>> 	if (!mapping)
>>> 		continue
>>>
>>> 	...
>>> 	if (new_folio->index < end)
>>> 		...
>>>
>>>>  				lru_add_split_folio(folio, new_folio, lruvec, list);
>
> Talked to Claude and find an accounting issue with this. Without putting
> EOF after-split folios back to LRU, they are not going through lruvec_del_folio(),
> which decreases NR_*_LRU counter along with removing the folio from LRU
> and it causes NR_*_LRU accounting errors. Note that the original folio
> is on LRU all the time and LRU counters are not modified and after the split
> the original folio size is decreased and the after-split folios need to
> be added back to LRU to keep the LRU counters right. We will need to adjust
> LRU accounting for (!mapping || new_folio->index < end) if we decide to
> not add them back to LRU.
>
>
> Best Regards,
> Yan, Zi


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-06-10 18:44 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-10 12:05 [RFC PATCH] mm/huge_memory: do not add dropped split tail folios to LRU zhaoyang.huang
2026-06-10 12:50 ` David Hildenbrand (Arm)
2026-06-10 14:38   ` Zi Yan
2026-06-10 17:25     ` Zi Yan
2026-06-10 18:44       ` Zi Yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox