* [RFC PATCH] mm/huge_memory: do not add dropped split tail folios to LRU
@ 2026-06-10 12:05 zhaoyang.huang
2026-06-10 12:50 ` David Hildenbrand (Arm)
` (3 more replies)
0 siblings, 4 replies; 16+ messages in thread
From: zhaoyang.huang @ 2026-06-10 12:05 UTC (permalink / raw)
To: Andrew Morton, David Hildenbrand, Zi Yan, Lorenzo Stoakes,
Barry Song, Baolin Wang, Lance Yang, Liam R . Howlett, Nico Pache,
Ryan Roberts, Dev Jain, linux-mm, linux-kernel, Zhaoyang Huang,
steve.kang
From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
The kernel panics are keeping to be reported especially when the f2fs
partition get almost full. By investigation, we find that the reason is
one f2fs page got freed to buddy without being deleted from LRU and the
root cause is the race happened in [2] which is enrolled by this commit.
We solve this issue by reverting a f2fs commit 9609dd704725 ("f2fs: remove
non-uptodate folio from the page cache in move_data_block").
There are 3 race processes in this scenario, please find below for their
main activities. However, by further investigation over the code, I
think there is a common race window for the truncated folios between
split_folio_to_order and folio_isolate_lru, where the folios lost the
refcount on page cache and remains the transient one of the split
caller, under which the folio could enter free path and compete with the
isolation process. This commit would like to suggest to have the folios
beyond EOF stay out of LRU.
Truncate:
The changed code in move_data_block() lets the GC path evict the tail-end
folio from the page cache through folio_end_dropbehind(). Once
folio_unmap_invalidate() removes the folio from mapping->i_pages, the
page-cache references for all pages in the folio are dropped. The folio
is then kept alive only by temporary external references, which allows a
later split to operate on a folio whose subpages are no longer protected
by page-cache references.
Split:
After the page-cache references are gone, split_folio_to_order() can
split the big folio into individual pages and put the resulting subpages
back on the LRU. For tail pages beyond EOF, split removes them from the
page cache and drops their page-cache references. A tail page can then
remain on the LRU with PG_lru set while holding only the split caller's
temporary reference. When free_folio_and_swap_cache() drops that final
reference, the page enters the final folio_put() release path.
Isolate:
In parallel, folio_isolate_lru() can observe the same tail page with a
non-zero refcount and PG_lru set. It clears PG_lru before taking its own
reference. If this races with the final folio_put() from the split path,
__folio_put() sees PG_lru already cleared and skips lruvec_del_folio().
The page is then freed back to the allocator while its lru links are
still present in the LRU list. A later LRU operation on a neighboring
page detects the stale link and reports list corruption.
[1]
[ 22.486082] list_del corruption. next->prev should be fffffffec10e0ac8, but was dead000000000122. (next=fffffffec10e0a88)
[ 22.486130] ------------[ cut here ]------------
[ 22.486134] kernel BUG at lib/list_debug.c:67!
[ 22.486141] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
[ 22.488502] Tainted: [W]=WARN, [O]=OOT_MODULE
[ 22.488506] Hardware name: Spreadtrum UMS9230 1H10 SoC (DT)
[ 22.488511] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 22.488517] pc : __list_del_entry_valid_or_report+0x14c/0x154
[ 22.488531] lr : __list_del_entry_valid_or_report+0x14c/0x154
[ 22.488539] sp : ffffffc08006b830
[ 22.488542] x29: ffffffc08006b868 x28: 0000000000003020 x27: 0000000000000000
[ 22.488553] x26: 0000000000000000 x25: 0000000000000004 x24: fffffffec10e0ac0
[ 22.488564] x23: 00000000000000e8 x22: 0000000000000024 x21: dead000000000122
[ 22.488574] x20: fffffffec10e0a88 x19: fffffffec10e0ac8 x18: ffffffc080061060
[ 22.488585] x17: 20747562202c3863 x16: 6130653031636566 x15: 0000000000000058
[ 22.488595] x14: 0000000000000004 x13: ffffff80f91e0000 x12: 0000000000000003
[ 22.488605] x11: 0000000000000003 x10: 0000000000000001 x9 : ffe85721f0e25f00
[ 22.488615] x8 : ffe85721f0e25f00 x7 : 0000000000000000 x6 : 6c65645f7473696c
[ 22.488625] x5 : ffffffed39b23026 x4 : 0000000000000000 x3 : 0000000000000010
[ 22.488636] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 000000000000006d
[ 22.488647] Call trace:
[ 22.488651] __list_del_entry_valid_or_report+0x14c/0x154 (P)
[ 22.488661] __folio_put+0x2bc/0x434
[ 22.488670] folio_put+0x28/0x58
[ 22.488678] do_garbage_collect+0x1a34/0x2584
[ 22.488689] f2fs_gc+0x230/0x9b4
[ 22.488697] f2fs_fallocate+0xb90/0xdf4
[ 22.488706] vfs_fallocate+0x1b4/0x2bc
[ 22.488716] __arm64_sys_fallocate+0x44/0x78
[ 22.488725] invoke_syscall+0x58/0xe4
[ 22.488732] do_el0_svc+0x48/0xdc
[ 22.488739] el0_svc+0x3c/0x98
[ 22.488747] el0t_64_sync_handler+0x20/0x130
[ 22.488754] el0t_64_sync+0x1c4/0x1c8
[2]
CPU0 (f2fs GC) CPU1 (split_folio_to_order) CPU2 (folio_isolate_lru)
F: pagecache refs = n
F: extra refs = GC + split
F: PG_lru set
move_data_block()
folio = f2fs_grab_cache_folio(F)
...
__folio_set_dropbehind(F)
folio_unlock(F)
folio_end_dropbehind(F)
folio_unmap_invalidate(F)
__filemap_remove_folio(F)
folio_put_refs(F, n)
folio_put(F)
split_folio_to_order(F)
folio_ref_freeze(F, 1)
...
lru_add_split_folio(T)
list_add_tail(&T->lru, &F->lru)
folio_set_lru(T)
__filemap_remove_folio(T)
folio_put_refs(T, 1)
/* T refcount == 1, PageLRU set */
free_folio_and_swap_cache(T)
folio_put(T)
/* refcount: 1 -> 0 */
folio_isolate_lru(T)
folio_test_clear_lru(T)
__folio_put(T)
__page_cache_release(T)
folio_test_lru(T) == false
/* skip lruvec_del_folio(T) */
free_frozen_pages(T)
folio_get(T)
lruvec_del_folio(T)
later:
list_del(adjacent->lru)
next == &T->lru
next->prev == LIST_POISON / PCP freelist
BUG
Assisted-by: Cursor:claude-opus-4-8
Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
---
mm/huge_memory.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 970e077019b7..7465525a94a8 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3966,7 +3966,7 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
folio_ref_unfreeze(new_folio,
folio_cache_ref_count(new_folio) + 1);
- if (do_lru)
+ if (do_lru && !(mapping && new_folio->index >= end))
lru_add_split_folio(folio, new_folio, lruvec, list);
/*
--
2.25.1
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [RFC PATCH] mm/huge_memory: do not add dropped split tail folios to LRU 2026-06-10 12:05 [RFC PATCH] mm/huge_memory: do not add dropped split tail folios to LRU zhaoyang.huang @ 2026-06-10 12:50 ` David Hildenbrand (Arm) 2026-06-10 14:38 ` Zi Yan 2026-06-10 20:30 ` Andrew Morton ` (2 subsequent siblings) 3 siblings, 1 reply; 16+ messages in thread From: David Hildenbrand (Arm) @ 2026-06-10 12:50 UTC (permalink / raw) To: zhaoyang.huang, Andrew Morton, Zi Yan, Lorenzo Stoakes, Barry Song, Baolin Wang, Lance Yang, Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain, linux-mm, linux-kernel, Zhaoyang Huang, steve.kang On 6/10/26 14:05, zhaoyang.huang wrote: > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com> > > The kernel panics are keeping to be reported especially when the f2fs > partition get almost full. By investigation, we find that the reason is > one f2fs page got freed to buddy without being deleted from LRU and the > root cause is the race happened in [2] which is enrolled by this commit. > We solve this issue by reverting a f2fs commit 9609dd704725 ("f2fs: remove > non-uptodate folio from the page cache in move_data_block"). But I assume, that other FSes can trigger this as well? Any insights? > > There are 3 race processes in this scenario, please find below for their > main activities. However, by further investigation over the code, I > think there is a common race window for the truncated folios between > split_folio_to_order and folio_isolate_lru, where the folios lost the > refcount on page cache and remains the transient one of the split > caller, under which the folio could enter free path and compete with the > isolation process. This commit would like to suggest to have the folios > beyond EOF stay out of LRU. > > Truncate: > The changed code in move_data_block() lets the GC path evict the tail-end > folio from the page cache through folio_end_dropbehind(). Once > folio_unmap_invalidate() removes the folio from mapping->i_pages, the > page-cache references for all pages in the folio are dropped. The folio > is then kept alive only by temporary external references, which allows a > later split to operate on a folio whose subpages are no longer protected > by page-cache references. > > Split: > After the page-cache references are gone, split_folio_to_order() can > split the big folio into individual pages and put the resulting subpages > back on the LRU. For tail pages beyond EOF, split removes them from the > page cache and drops their page-cache references. A tail page can then > remain on the LRU with PG_lru set while holding only the split caller's > temporary reference. When free_folio_and_swap_cache() drops that final > reference, the page enters the final folio_put() release path. > > Isolate: > In parallel, folio_isolate_lru() can observe the same tail page with a > non-zero refcount and PG_lru set. It clears PG_lru before taking its own > reference. If this races with the final folio_put() from the split path, > __folio_put() sees PG_lru already cleared and skips lruvec_del_folio(). > The page is then freed back to the allocator while its lru links are > still present in the LRU list. A later LRU operation on a neighboring > page detects the stale link and reports list corruption. Complicated mess :( So, folio_isolate_lru() really only requires the caller to hold a folio reference, which can happen given that we did the folio_ref_unfreeze(). It can, for example, be triggered by memory offlining or page migration. So we really want to not allow folio_isolate_lru() while we are still processing the folio. What your patch does is, simply not add folios that we will drop from the page cache to the LRU? You should describe here how you are fixing it: "Let's fix it by..." > > [1] > [ 22.486082] list_del corruption. next->prev should be fffffffec10e0ac8, but was dead000000000122. (next=fffffffec10e0a88) > [ 22.486130] ------------[ cut here ]------------ > [ 22.486134] kernel BUG at lib/list_debug.c:67! > [ 22.486141] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP > [ 22.488502] Tainted: [W]=WARN, [O]=OOT_MODULE > [ 22.488506] Hardware name: Spreadtrum UMS9230 1H10 SoC (DT) > [ 22.488511] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > [ 22.488517] pc : __list_del_entry_valid_or_report+0x14c/0x154 > [ 22.488531] lr : __list_del_entry_valid_or_report+0x14c/0x154 > [ 22.488539] sp : ffffffc08006b830 > [ 22.488542] x29: ffffffc08006b868 x28: 0000000000003020 x27: 0000000000000000 > [ 22.488553] x26: 0000000000000000 x25: 0000000000000004 x24: fffffffec10e0ac0 > [ 22.488564] x23: 00000000000000e8 x22: 0000000000000024 x21: dead000000000122 > [ 22.488574] x20: fffffffec10e0a88 x19: fffffffec10e0ac8 x18: ffffffc080061060 > [ 22.488585] x17: 20747562202c3863 x16: 6130653031636566 x15: 0000000000000058 > [ 22.488595] x14: 0000000000000004 x13: ffffff80f91e0000 x12: 0000000000000003 > [ 22.488605] x11: 0000000000000003 x10: 0000000000000001 x9 : ffe85721f0e25f00 > [ 22.488615] x8 : ffe85721f0e25f00 x7 : 0000000000000000 x6 : 6c65645f7473696c > [ 22.488625] x5 : ffffffed39b23026 x4 : 0000000000000000 x3 : 0000000000000010 > [ 22.488636] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 000000000000006d > [ 22.488647] Call trace: > [ 22.488651] __list_del_entry_valid_or_report+0x14c/0x154 (P) > [ 22.488661] __folio_put+0x2bc/0x434 > [ 22.488670] folio_put+0x28/0x58 > [ 22.488678] do_garbage_collect+0x1a34/0x2584 > [ 22.488689] f2fs_gc+0x230/0x9b4 > [ 22.488697] f2fs_fallocate+0xb90/0xdf4 > [ 22.488706] vfs_fallocate+0x1b4/0x2bc > [ 22.488716] __arm64_sys_fallocate+0x44/0x78 > [ 22.488725] invoke_syscall+0x58/0xe4 > [ 22.488732] do_el0_svc+0x48/0xdc > [ 22.488739] el0_svc+0x3c/0x98 > [ 22.488747] el0t_64_sync_handler+0x20/0x130 > [ 22.488754] el0t_64_sync+0x1c4/0x1c8 > > [2] > CPU0 (f2fs GC) CPU1 (split_folio_to_order) CPU2 (folio_isolate_lru) > > F: pagecache refs = n > F: extra refs = GC + split > F: PG_lru set > move_data_block() > folio = f2fs_grab_cache_folio(F) > ... > __folio_set_dropbehind(F) > folio_unlock(F) > folio_end_dropbehind(F) > folio_unmap_invalidate(F) > __filemap_remove_folio(F) > folio_put_refs(F, n) > folio_put(F) > split_folio_to_order(F) > folio_ref_freeze(F, 1) > ... > lru_add_split_folio(T) > list_add_tail(&T->lru, &F->lru) > folio_set_lru(T) > __filemap_remove_folio(T) > folio_put_refs(T, 1) > /* T refcount == 1, PageLRU set */ > free_folio_and_swap_cache(T) > folio_put(T) > /* refcount: 1 -> 0 */ > folio_isolate_lru(T) > folio_test_clear_lru(T) > __folio_put(T) > __page_cache_release(T) > folio_test_lru(T) == false > /* skip lruvec_del_folio(T) */ > free_frozen_pages(T) > folio_get(T) > lruvec_del_folio(T) > later: > list_del(adjacent->lru) > next == &T->lru > next->prev == LIST_POISON / PCP freelist > BUG > > Assisted-by: Cursor:claude-opus-4-8 > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> I'm wondering if this has been broken the whole time, or if some rework allowed this to trigger. I assume the issue can be triggered for other FSes, and we want Fixes: + CC: stable? Looking into the history, I think we always unconditionally did the lru_add_split_folio()/lru_add_page_tail(). > --- > mm/huge_memory.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 970e077019b7..7465525a94a8 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -3966,7 +3966,7 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n > folio_ref_unfreeze(new_folio, > folio_cache_ref_count(new_folio) + 1); > > - if (do_lru) > + if (do_lru && !(mapping && new_folio->index >= end)) It might be clearer to write this as do_lru && (!mapping || new_folio->index < end) To match the page-cache check further below if (!mapping) continue ... if (new_folio->index < end) ... > lru_add_split_folio(folio, new_folio, lruvec, list); > > /* folio_check_splittable() makes sure that we have a mapping for non-anon folios. (no truncation). end is then only set for non-anon folios. @Zi, any thoughts? -- Cheers, David ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH] mm/huge_memory: do not add dropped split tail folios to LRU 2026-06-10 12:50 ` David Hildenbrand (Arm) @ 2026-06-10 14:38 ` Zi Yan 2026-06-10 17:25 ` Zi Yan 2026-06-11 1:39 ` Zhaoyang Huang 0 siblings, 2 replies; 16+ messages in thread From: Zi Yan @ 2026-06-10 14:38 UTC (permalink / raw) To: David Hildenbrand (Arm), zhaoyang.huang Cc: Andrew Morton, Lorenzo Stoakes, Barry Song, Baolin Wang, Lance Yang, Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain, linux-mm, linux-kernel, Zhaoyang Huang, steve.kang On 10 Jun 2026, at 8:50, David Hildenbrand (Arm) wrote: > On 6/10/26 14:05, zhaoyang.huang wrote: >> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com> >> >> The kernel panics are keeping to be reported especially when the f2fs >> partition get almost full. By investigation, we find that the reason is >> one f2fs page got freed to buddy without being deleted from LRU and the >> root cause is the race happened in [2] which is enrolled by this commit. >> We solve this issue by reverting a f2fs commit 9609dd704725 ("f2fs: remove >> non-uptodate folio from the page cache in move_data_block"). > > But I assume, that other FSes can trigger this as well? Any insights? > >> >> There are 3 race processes in this scenario, please find below for their >> main activities. However, by further investigation over the code, I >> think there is a common race window for the truncated folios between >> split_folio_to_order and folio_isolate_lru, where the folios lost the >> refcount on page cache and remains the transient one of the split >> caller, under which the folio could enter free path and compete with the >> isolation process. This commit would like to suggest to have the folios >> beyond EOF stay out of LRU. >> >> Truncate: >> The changed code in move_data_block() lets the GC path evict the tail-end >> folio from the page cache through folio_end_dropbehind(). Once >> folio_unmap_invalidate() removes the folio from mapping->i_pages, the >> page-cache references for all pages in the folio are dropped. The folio >> is then kept alive only by temporary external references, which allows a >> later split to operate on a folio whose subpages are no longer protected >> by page-cache references. >> >> Split: >> After the page-cache references are gone, split_folio_to_order() can >> split the big folio into individual pages and put the resulting subpages >> back on the LRU. For tail pages beyond EOF, split removes them from the >> page cache and drops their page-cache references. A tail page can then >> remain on the LRU with PG_lru set while holding only the split caller's >> temporary reference. When free_folio_and_swap_cache() drops that final >> reference, the page enters the final folio_put() release path. >> >> Isolate: >> In parallel, folio_isolate_lru() can observe the same tail page with a >> non-zero refcount and PG_lru set. It clears PG_lru before taking its own >> reference. If this races with the final folio_put() from the split path, >> __folio_put() sees PG_lru already cleared and skips lruvec_del_folio(). >> The page is then freed back to the allocator while its lru links are >> still present in the LRU list. A later LRU operation on a neighboring >> page detects the stale link and reports list corruption. > > Complicated mess :( > > So, folio_isolate_lru() really only requires the caller to hold a folio > reference, which can happen given that we did the folio_ref_unfreeze(). It can, > for example, be triggered by memory offlining or page migration. > > So we really want to not allow folio_isolate_lru() while we are still processing > the folio. Or we should defer adding split folios to LRU after unfreeze. > > What your patch does is, simply not add folios that we will drop from the page > cache to the LRU? > > > You should describe here how you are fixing it: "Let's fix it by..." > >> >> [1] >> [ 22.486082] list_del corruption. next->prev should be fffffffec10e0ac8, but was dead000000000122. (next=fffffffec10e0a88) >> [ 22.486130] ------------[ cut here ]------------ >> [ 22.486134] kernel BUG at lib/list_debug.c:67! >> [ 22.486141] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP >> [ 22.488502] Tainted: [W]=WARN, [O]=OOT_MODULE >> [ 22.488506] Hardware name: Spreadtrum UMS9230 1H10 SoC (DT) >> [ 22.488511] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) >> [ 22.488517] pc : __list_del_entry_valid_or_report+0x14c/0x154 >> [ 22.488531] lr : __list_del_entry_valid_or_report+0x14c/0x154 >> [ 22.488539] sp : ffffffc08006b830 >> [ 22.488542] x29: ffffffc08006b868 x28: 0000000000003020 x27: 0000000000000000 >> [ 22.488553] x26: 0000000000000000 x25: 0000000000000004 x24: fffffffec10e0ac0 >> [ 22.488564] x23: 00000000000000e8 x22: 0000000000000024 x21: dead000000000122 >> [ 22.488574] x20: fffffffec10e0a88 x19: fffffffec10e0ac8 x18: ffffffc080061060 >> [ 22.488585] x17: 20747562202c3863 x16: 6130653031636566 x15: 0000000000000058 >> [ 22.488595] x14: 0000000000000004 x13: ffffff80f91e0000 x12: 0000000000000003 >> [ 22.488605] x11: 0000000000000003 x10: 0000000000000001 x9 : ffe85721f0e25f00 >> [ 22.488615] x8 : ffe85721f0e25f00 x7 : 0000000000000000 x6 : 6c65645f7473696c >> [ 22.488625] x5 : ffffffed39b23026 x4 : 0000000000000000 x3 : 0000000000000010 >> [ 22.488636] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 000000000000006d >> [ 22.488647] Call trace: >> [ 22.488651] __list_del_entry_valid_or_report+0x14c/0x154 (P) >> [ 22.488661] __folio_put+0x2bc/0x434 >> [ 22.488670] folio_put+0x28/0x58 >> [ 22.488678] do_garbage_collect+0x1a34/0x2584 >> [ 22.488689] f2fs_gc+0x230/0x9b4 >> [ 22.488697] f2fs_fallocate+0xb90/0xdf4 >> [ 22.488706] vfs_fallocate+0x1b4/0x2bc >> [ 22.488716] __arm64_sys_fallocate+0x44/0x78 >> [ 22.488725] invoke_syscall+0x58/0xe4 >> [ 22.488732] do_el0_svc+0x48/0xdc >> [ 22.488739] el0_svc+0x3c/0x98 >> [ 22.488747] el0t_64_sync_handler+0x20/0x130 >> [ 22.488754] el0t_64_sync+0x1c4/0x1c8 >> >> [2] >> CPU0 (f2fs GC) CPU1 (split_folio_to_order) CPU2 (folio_isolate_lru) >> >> F: pagecache refs = n >> F: extra refs = GC + split >> F: PG_lru set >> move_data_block() >> folio = f2fs_grab_cache_folio(F) >> ... >> __folio_set_dropbehind(F) >> folio_unlock(F) >> folio_end_dropbehind(F) >> folio_unmap_invalidate(F) >> __filemap_remove_folio(F) >> folio_put_refs(F, n) >> folio_put(F) >> split_folio_to_order(F) >> folio_ref_freeze(F, 1) >> ... >> lru_add_split_folio(T) >> list_add_tail(&T->lru, &F->lru) >> folio_set_lru(T) >> __filemap_remove_folio(T) >> folio_put_refs(T, 1) >> /* T refcount == 1, PageLRU set */ >> free_folio_and_swap_cache(T) >> folio_put(T) >> /* refcount: 1 -> 0 */ >> folio_isolate_lru(T) If refcount is 0 at this point, VM_BUG_ON_FOLIO(!folio_ref_count(folio), folio) in folio_isolate_lru() would be triggered. Maybe we could just return false in that case. >> folio_test_clear_lru(T) >> __folio_put(T) >> __page_cache_release(T) >> folio_test_lru(T) == false >> /* skip lruvec_del_folio(T) */ >> free_frozen_pages(T) >> folio_get(T) >> lruvec_del_folio(T) But in CPU2 (folio_isolate_lru), lruvec_del_folio(T) should remove T from LRU list. >> later: >> list_del(adjacent->lru) >> next == &T->lru >> next->prev == LIST_POISON / PCP freelist >> BUG >> Why does CPU0 still see the stale link from adjacent? >> Assisted-by: Cursor:claude-opus-4-8 >> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> > > I'm wondering if this has been broken the whole time, or if some rework allowed > this to trigger. > > I assume the issue can be triggered for other FSes, and we want Fixes: + CC: stable? > > Looking into the history, I think we always unconditionally did the > lru_add_split_folio()/lru_add_page_tail(). > >> --- >> mm/huge_memory.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index 970e077019b7..7465525a94a8 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -3966,7 +3966,7 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n >> folio_ref_unfreeze(new_folio, >> folio_cache_ref_count(new_folio) + 1); >> >> - if (do_lru) >> + if (do_lru && !(mapping && new_folio->index >= end)) > > It might be clearer to write this as > > do_lru && (!mapping || new_folio->index < end) > > To match the page-cache check further below > > if (!mapping) > continue > > ... > if (new_folio->index < end) > ... > >> lru_add_split_folio(folio, new_folio, lruvec, list); >> >> /* > > folio_check_splittable() makes sure that we have a mapping for non-anon folios. > (no truncation). end is then only set for non-anon folios. > > @Zi, any thoughts? The fix works but I feel that it is masking the race between folio_isolate_lru() and folio_put(). I worry that the same issue might be triggered in other ways or in new code if we do not fix the race. To summarize my thoughts above: 1. adding frozen folios in LRU might be problematic, since folio_isolate_lru() has a VM_BUG_ON_FOLIO() for it but still chooses to proceed the isolation. 2. the race analysis is not clear, since both folio_isolate_lru() and folio_put() do lruvec_del_folio() if folio is on LRU. When list_del(adjacent->lru) sees the stale link, the folio is already in buddy and page->lru is modified for PageBuddy use? So even without CPU0, folio_isolate_lru()'s lruvec_del_folio() can do the wrong thing on pages on buddy? -- Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH] mm/huge_memory: do not add dropped split tail folios to LRU 2026-06-10 14:38 ` Zi Yan @ 2026-06-10 17:25 ` Zi Yan 2026-06-10 18:44 ` Zi Yan 2026-06-11 1:39 ` Zhaoyang Huang 1 sibling, 1 reply; 16+ messages in thread From: Zi Yan @ 2026-06-10 17:25 UTC (permalink / raw) To: David Hildenbrand (Arm), zhaoyang.huang Cc: Andrew Morton, Lorenzo Stoakes, Barry Song, Baolin Wang, Lance Yang, Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain, linux-mm, linux-kernel, Zhaoyang Huang, steve.kang On 10 Jun 2026, at 10:38, Zi Yan wrote: > On 10 Jun 2026, at 8:50, David Hildenbrand (Arm) wrote: > >> On 6/10/26 14:05, zhaoyang.huang wrote: >>> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com> >>> >>> The kernel panics are keeping to be reported especially when the f2fs >>> partition get almost full. By investigation, we find that the reason is >>> one f2fs page got freed to buddy without being deleted from LRU and the >>> root cause is the race happened in [2] which is enrolled by this commit. >>> We solve this issue by reverting a f2fs commit 9609dd704725 ("f2fs: remove >>> non-uptodate folio from the page cache in move_data_block"). >> >> But I assume, that other FSes can trigger this as well? Any insights? >> >>> >>> There are 3 race processes in this scenario, please find below for their >>> main activities. However, by further investigation over the code, I >>> think there is a common race window for the truncated folios between >>> split_folio_to_order and folio_isolate_lru, where the folios lost the >>> refcount on page cache and remains the transient one of the split >>> caller, under which the folio could enter free path and compete with the >>> isolation process. This commit would like to suggest to have the folios >>> beyond EOF stay out of LRU. >>> >>> Truncate: >>> The changed code in move_data_block() lets the GC path evict the tail-end >>> folio from the page cache through folio_end_dropbehind(). Once >>> folio_unmap_invalidate() removes the folio from mapping->i_pages, the >>> page-cache references for all pages in the folio are dropped. The folio >>> is then kept alive only by temporary external references, which allows a >>> later split to operate on a folio whose subpages are no longer protected >>> by page-cache references. >>> >>> Split: >>> After the page-cache references are gone, split_folio_to_order() can >>> split the big folio into individual pages and put the resulting subpages >>> back on the LRU. For tail pages beyond EOF, split removes them from the >>> page cache and drops their page-cache references. A tail page can then >>> remain on the LRU with PG_lru set while holding only the split caller's >>> temporary reference. When free_folio_and_swap_cache() drops that final >>> reference, the page enters the final folio_put() release path. >>> >>> Isolate: >>> In parallel, folio_isolate_lru() can observe the same tail page with a >>> non-zero refcount and PG_lru set. It clears PG_lru before taking its own >>> reference. If this races with the final folio_put() from the split path, >>> __folio_put() sees PG_lru already cleared and skips lruvec_del_folio(). >>> The page is then freed back to the allocator while its lru links are >>> still present in the LRU list. A later LRU operation on a neighboring >>> page detects the stale link and reports list corruption. >> >> Complicated mess :( >> >> So, folio_isolate_lru() really only requires the caller to hold a folio >> reference, which can happen given that we did the folio_ref_unfreeze(). It can, >> for example, be triggered by memory offlining or page migration. >> >> So we really want to not allow folio_isolate_lru() while we are still processing >> the folio. > > Or we should defer adding split folios to LRU after unfreeze. > >> >> What your patch does is, simply not add folios that we will drop from the page >> cache to the LRU? >> >> >> You should describe here how you are fixing it: "Let's fix it by..." >> >>> >>> [1] >>> [ 22.486082] list_del corruption. next->prev should be fffffffec10e0ac8, but was dead000000000122. (next=fffffffec10e0a88) >>> [ 22.486130] ------------[ cut here ]------------ >>> [ 22.486134] kernel BUG at lib/list_debug.c:67! >>> [ 22.486141] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP >>> [ 22.488502] Tainted: [W]=WARN, [O]=OOT_MODULE >>> [ 22.488506] Hardware name: Spreadtrum UMS9230 1H10 SoC (DT) >>> [ 22.488511] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) >>> [ 22.488517] pc : __list_del_entry_valid_or_report+0x14c/0x154 >>> [ 22.488531] lr : __list_del_entry_valid_or_report+0x14c/0x154 >>> [ 22.488539] sp : ffffffc08006b830 >>> [ 22.488542] x29: ffffffc08006b868 x28: 0000000000003020 x27: 0000000000000000 >>> [ 22.488553] x26: 0000000000000000 x25: 0000000000000004 x24: fffffffec10e0ac0 >>> [ 22.488564] x23: 00000000000000e8 x22: 0000000000000024 x21: dead000000000122 >>> [ 22.488574] x20: fffffffec10e0a88 x19: fffffffec10e0ac8 x18: ffffffc080061060 >>> [ 22.488585] x17: 20747562202c3863 x16: 6130653031636566 x15: 0000000000000058 >>> [ 22.488595] x14: 0000000000000004 x13: ffffff80f91e0000 x12: 0000000000000003 >>> [ 22.488605] x11: 0000000000000003 x10: 0000000000000001 x9 : ffe85721f0e25f00 >>> [ 22.488615] x8 : ffe85721f0e25f00 x7 : 0000000000000000 x6 : 6c65645f7473696c >>> [ 22.488625] x5 : ffffffed39b23026 x4 : 0000000000000000 x3 : 0000000000000010 >>> [ 22.488636] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 000000000000006d >>> [ 22.488647] Call trace: >>> [ 22.488651] __list_del_entry_valid_or_report+0x14c/0x154 (P) >>> [ 22.488661] __folio_put+0x2bc/0x434 >>> [ 22.488670] folio_put+0x28/0x58 >>> [ 22.488678] do_garbage_collect+0x1a34/0x2584 >>> [ 22.488689] f2fs_gc+0x230/0x9b4 >>> [ 22.488697] f2fs_fallocate+0xb90/0xdf4 >>> [ 22.488706] vfs_fallocate+0x1b4/0x2bc >>> [ 22.488716] __arm64_sys_fallocate+0x44/0x78 >>> [ 22.488725] invoke_syscall+0x58/0xe4 >>> [ 22.488732] do_el0_svc+0x48/0xdc >>> [ 22.488739] el0_svc+0x3c/0x98 >>> [ 22.488747] el0t_64_sync_handler+0x20/0x130 >>> [ 22.488754] el0t_64_sync+0x1c4/0x1c8 >>> >>> [2] >>> CPU0 (f2fs GC) CPU1 (split_folio_to_order) CPU2 (folio_isolate_lru) >>> >>> F: pagecache refs = n >>> F: extra refs = GC + split >>> F: PG_lru set >>> move_data_block() >>> folio = f2fs_grab_cache_folio(F) >>> ... >>> __folio_set_dropbehind(F) >>> folio_unlock(F) >>> folio_end_dropbehind(F) >>> folio_unmap_invalidate(F) >>> __filemap_remove_folio(F) >>> folio_put_refs(F, n) >>> folio_put(F) >>> split_folio_to_order(F) >>> folio_ref_freeze(F, 1) >>> ... >>> lru_add_split_folio(T) >>> list_add_tail(&T->lru, &F->lru) >>> folio_set_lru(T) >>> __filemap_remove_folio(T) >>> folio_put_refs(T, 1) >>> /* T refcount == 1, PageLRU set */ >>> free_folio_and_swap_cache(T) >>> folio_put(T) >>> /* refcount: 1 -> 0 */ >>> folio_isolate_lru(T) > > If refcount is 0 at this point, VM_BUG_ON_FOLIO(!folio_ref_count(folio), folio) in > folio_isolate_lru() would be triggered. Maybe we could just return false in that case. > >>> folio_test_clear_lru(T) >>> __folio_put(T) >>> __page_cache_release(T) >>> folio_test_lru(T) == false >>> /* skip lruvec_del_folio(T) */ >>> free_frozen_pages(T) >>> folio_get(T) >>> lruvec_del_folio(T) > > But in CPU2 (folio_isolate_lru), lruvec_del_folio(T) should remove T from LRU list. > >>> later: >>> list_del(adjacent->lru) >>> next == &T->lru >>> next->prev == LIST_POISON / PCP freelist >>> BUG >>> > > Why does CPU0 still see the stale link from adjacent? > >>> Assisted-by: Cursor:claude-opus-4-8 >>> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> >> >> I'm wondering if this has been broken the whole time, or if some rework allowed >> this to trigger. >> >> I assume the issue can be triggered for other FSes, and we want Fixes: + CC: stable? >> >> Looking into the history, I think we always unconditionally did the >> lru_add_split_folio()/lru_add_page_tail(). >> >>> --- >>> mm/huge_memory.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>> index 970e077019b7..7465525a94a8 100644 >>> --- a/mm/huge_memory.c >>> +++ b/mm/huge_memory.c >>> @@ -3966,7 +3966,7 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n >>> folio_ref_unfreeze(new_folio, >>> folio_cache_ref_count(new_folio) + 1); >>> >>> - if (do_lru) >>> + if (do_lru && !(mapping && new_folio->index >= end)) >> >> It might be clearer to write this as >> >> do_lru && (!mapping || new_folio->index < end) >> >> To match the page-cache check further below >> >> if (!mapping) >> continue >> >> ... >> if (new_folio->index < end) >> ... >> >>> lru_add_split_folio(folio, new_folio, lruvec, list); Talked to Claude and find an accounting issue with this. Without putting EOF after-split folios back to LRU, they are not going through lruvec_del_folio(), which decreases NR_*_LRU counter along with removing the folio from LRU and it causes NR_*_LRU accounting errors. Note that the original folio is on LRU all the time and LRU counters are not modified and after the split the original folio size is decreased and the after-split folios need to be added back to LRU to keep the LRU counters right. We will need to adjust LRU accounting for (!mapping || new_folio->index < end) if we decide to not add them back to LRU. Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH] mm/huge_memory: do not add dropped split tail folios to LRU 2026-06-10 17:25 ` Zi Yan @ 2026-06-10 18:44 ` Zi Yan 2026-06-11 1:19 ` Zhaoyang Huang 0 siblings, 1 reply; 16+ messages in thread From: Zi Yan @ 2026-06-10 18:44 UTC (permalink / raw) To: David Hildenbrand (Arm), zhaoyang.huang Cc: Andrew Morton, Lorenzo Stoakes, Barry Song, Baolin Wang, Lance Yang, Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain, linux-mm, linux-kernel, Zhaoyang Huang, steve.kang On 10 Jun 2026, at 13:25, Zi Yan wrote: > On 10 Jun 2026, at 10:38, Zi Yan wrote: > >> On 10 Jun 2026, at 8:50, David Hildenbrand (Arm) wrote: >> >>> On 6/10/26 14:05, zhaoyang.huang wrote: >>>> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com> >>>> >>>> The kernel panics are keeping to be reported especially when the f2fs >>>> partition get almost full. By investigation, we find that the reason is >>>> one f2fs page got freed to buddy without being deleted from LRU and the >>>> root cause is the race happened in [2] which is enrolled by this commit. >>>> We solve this issue by reverting a f2fs commit 9609dd704725 ("f2fs: remove >>>> non-uptodate folio from the page cache in move_data_block"). >>> >>> But I assume, that other FSes can trigger this as well? Any insights? >>> >>>> >>>> There are 3 race processes in this scenario, please find below for their >>>> main activities. However, by further investigation over the code, I >>>> think there is a common race window for the truncated folios between >>>> split_folio_to_order and folio_isolate_lru, where the folios lost the >>>> refcount on page cache and remains the transient one of the split >>>> caller, under which the folio could enter free path and compete with the >>>> isolation process. This commit would like to suggest to have the folios >>>> beyond EOF stay out of LRU. >>>> >>>> Truncate: >>>> The changed code in move_data_block() lets the GC path evict the tail-end >>>> folio from the page cache through folio_end_dropbehind(). Once >>>> folio_unmap_invalidate() removes the folio from mapping->i_pages, the >>>> page-cache references for all pages in the folio are dropped. The folio >>>> is then kept alive only by temporary external references, which allows a >>>> later split to operate on a folio whose subpages are no longer protected >>>> by page-cache references. >>>> >>>> Split: >>>> After the page-cache references are gone, split_folio_to_order() can >>>> split the big folio into individual pages and put the resulting subpages >>>> back on the LRU. For tail pages beyond EOF, split removes them from the >>>> page cache and drops their page-cache references. A tail page can then >>>> remain on the LRU with PG_lru set while holding only the split caller's >>>> temporary reference. When free_folio_and_swap_cache() drops that final >>>> reference, the page enters the final folio_put() release path. >>>> >>>> Isolate: >>>> In parallel, folio_isolate_lru() can observe the same tail page with a >>>> non-zero refcount and PG_lru set. It clears PG_lru before taking its own >>>> reference. If this races with the final folio_put() from the split path, >>>> __folio_put() sees PG_lru already cleared and skips lruvec_del_folio(). >>>> The page is then freed back to the allocator while its lru links are >>>> still present in the LRU list. A later LRU operation on a neighboring >>>> page detects the stale link and reports list corruption. Something is wrong here with the caller of folio_isolate_lru(), since folio_isolate_lru() requires the caller to take an elevated refcount. This means when entering folio_isolate_lru(), the EOF folio should have at least refcount == 2, 1 from folio_split(), 1 from the caller of folio_isolate_lru(). This should prevent the EOF folio being freed by the parallel __folio_put(). Hi Zhaoyang, can you elaborate on the folio_isolate_lru() caller? In addition (with the help of Claude), the race trace[2] below looks invalid. It says split happens after folio_end_dropbehind(), which sets folio->mapping to NULL, but __folio_split() returns -EBUSY when folio->mapping is NULL in filemap_release_folio() check. So the split cannot happen. Now I am not sure if the bug report is valid or not. At least for folio_split() and folio_isolate_lru(), the race should not exist. But let me know if I miss anything. >>> >>> Complicated mess :( >>> >>> So, folio_isolate_lru() really only requires the caller to hold a folio >>> reference, which can happen given that we did the folio_ref_unfreeze(). It can, >>> for example, be triggered by memory offlining or page migration. >>> >>> So we really want to not allow folio_isolate_lru() while we are still processing >>> the folio. >> >> Or we should defer adding split folios to LRU after unfreeze. >> >>> >>> What your patch does is, simply not add folios that we will drop from the page >>> cache to the LRU? >>> >>> >>> You should describe here how you are fixing it: "Let's fix it by..." >>> >>>> >>>> [1] >>>> [ 22.486082] list_del corruption. next->prev should be fffffffec10e0ac8, but was dead000000000122. (next=fffffffec10e0a88) >>>> [ 22.486130] ------------[ cut here ]------------ >>>> [ 22.486134] kernel BUG at lib/list_debug.c:67! >>>> [ 22.486141] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP >>>> [ 22.488502] Tainted: [W]=WARN, [O]=OOT_MODULE >>>> [ 22.488506] Hardware name: Spreadtrum UMS9230 1H10 SoC (DT) >>>> [ 22.488511] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) >>>> [ 22.488517] pc : __list_del_entry_valid_or_report+0x14c/0x154 >>>> [ 22.488531] lr : __list_del_entry_valid_or_report+0x14c/0x154 >>>> [ 22.488539] sp : ffffffc08006b830 >>>> [ 22.488542] x29: ffffffc08006b868 x28: 0000000000003020 x27: 0000000000000000 >>>> [ 22.488553] x26: 0000000000000000 x25: 0000000000000004 x24: fffffffec10e0ac0 >>>> [ 22.488564] x23: 00000000000000e8 x22: 0000000000000024 x21: dead000000000122 >>>> [ 22.488574] x20: fffffffec10e0a88 x19: fffffffec10e0ac8 x18: ffffffc080061060 >>>> [ 22.488585] x17: 20747562202c3863 x16: 6130653031636566 x15: 0000000000000058 >>>> [ 22.488595] x14: 0000000000000004 x13: ffffff80f91e0000 x12: 0000000000000003 >>>> [ 22.488605] x11: 0000000000000003 x10: 0000000000000001 x9 : ffe85721f0e25f00 >>>> [ 22.488615] x8 : ffe85721f0e25f00 x7 : 0000000000000000 x6 : 6c65645f7473696c >>>> [ 22.488625] x5 : ffffffed39b23026 x4 : 0000000000000000 x3 : 0000000000000010 >>>> [ 22.488636] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 000000000000006d >>>> [ 22.488647] Call trace: >>>> [ 22.488651] __list_del_entry_valid_or_report+0x14c/0x154 (P) >>>> [ 22.488661] __folio_put+0x2bc/0x434 >>>> [ 22.488670] folio_put+0x28/0x58 >>>> [ 22.488678] do_garbage_collect+0x1a34/0x2584 >>>> [ 22.488689] f2fs_gc+0x230/0x9b4 >>>> [ 22.488697] f2fs_fallocate+0xb90/0xdf4 >>>> [ 22.488706] vfs_fallocate+0x1b4/0x2bc >>>> [ 22.488716] __arm64_sys_fallocate+0x44/0x78 >>>> [ 22.488725] invoke_syscall+0x58/0xe4 >>>> [ 22.488732] do_el0_svc+0x48/0xdc >>>> [ 22.488739] el0_svc+0x3c/0x98 >>>> [ 22.488747] el0t_64_sync_handler+0x20/0x130 >>>> [ 22.488754] el0t_64_sync+0x1c4/0x1c8 >>>> >>>> [2] >>>> CPU0 (f2fs GC) CPU1 (split_folio_to_order) CPU2 (folio_isolate_lru) >>>> >>>> F: pagecache refs = n >>>> F: extra refs = GC + split >>>> F: PG_lru set >>>> move_data_block() >>>> folio = f2fs_grab_cache_folio(F) >>>> ... >>>> __folio_set_dropbehind(F) >>>> folio_unlock(F) >>>> folio_end_dropbehind(F) >>>> folio_unmap_invalidate(F) >>>> __filemap_remove_folio(F) >>>> folio_put_refs(F, n) >>>> folio_put(F) >>>> split_folio_to_order(F) >>>> folio_ref_freeze(F, 1) >>>> ... >>>> lru_add_split_folio(T) >>>> list_add_tail(&T->lru, &F->lru) >>>> folio_set_lru(T) >>>> __filemap_remove_folio(T) >>>> folio_put_refs(T, 1) >>>> /* T refcount == 1, PageLRU set */ >>>> free_folio_and_swap_cache(T) >>>> folio_put(T) >>>> /* refcount: 1 -> 0 */ >>>> folio_isolate_lru(T) >> >> If refcount is 0 at this point, VM_BUG_ON_FOLIO(!folio_ref_count(folio), folio) in >> folio_isolate_lru() would be triggered. Maybe we could just return false in that case. >> >>>> folio_test_clear_lru(T) >>>> __folio_put(T) >>>> __page_cache_release(T) >>>> folio_test_lru(T) == false >>>> /* skip lruvec_del_folio(T) */ >>>> free_frozen_pages(T) >>>> folio_get(T) >>>> lruvec_del_folio(T) >> >> But in CPU2 (folio_isolate_lru), lruvec_del_folio(T) should remove T from LRU list. >> >>>> later: >>>> list_del(adjacent->lru) >>>> next == &T->lru >>>> next->prev == LIST_POISON / PCP freelist >>>> BUG >>>> >> >> Why does CPU0 still see the stale link from adjacent? >> >>>> Assisted-by: Cursor:claude-opus-4-8 >>>> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> >>> >>> I'm wondering if this has been broken the whole time, or if some rework allowed >>> this to trigger. >>> >>> I assume the issue can be triggered for other FSes, and we want Fixes: + CC: stable? >>> >>> Looking into the history, I think we always unconditionally did the >>> lru_add_split_folio()/lru_add_page_tail(). >>> >>>> --- >>>> mm/huge_memory.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>>> index 970e077019b7..7465525a94a8 100644 >>>> --- a/mm/huge_memory.c >>>> +++ b/mm/huge_memory.c >>>> @@ -3966,7 +3966,7 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n >>>> folio_ref_unfreeze(new_folio, >>>> folio_cache_ref_count(new_folio) + 1); >>>> >>>> - if (do_lru) >>>> + if (do_lru && !(mapping && new_folio->index >= end)) >>> >>> It might be clearer to write this as >>> >>> do_lru && (!mapping || new_folio->index < end) >>> >>> To match the page-cache check further below >>> >>> if (!mapping) >>> continue >>> >>> ... >>> if (new_folio->index < end) >>> ... >>> >>>> lru_add_split_folio(folio, new_folio, lruvec, list); > > Talked to Claude and find an accounting issue with this. Without putting > EOF after-split folios back to LRU, they are not going through lruvec_del_folio(), > which decreases NR_*_LRU counter along with removing the folio from LRU > and it causes NR_*_LRU accounting errors. Note that the original folio > is on LRU all the time and LRU counters are not modified and after the split > the original folio size is decreased and the after-split folios need to > be added back to LRU to keep the LRU counters right. We will need to adjust > LRU accounting for (!mapping || new_folio->index < end) if we decide to > not add them back to LRU. > > > Best Regards, > Yan, Zi Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH] mm/huge_memory: do not add dropped split tail folios to LRU 2026-06-10 18:44 ` Zi Yan @ 2026-06-11 1:19 ` Zhaoyang Huang 2026-06-11 1:49 ` Zi Yan 0 siblings, 1 reply; 16+ messages in thread From: Zhaoyang Huang @ 2026-06-11 1:19 UTC (permalink / raw) To: Zi Yan Cc: David Hildenbrand (Arm), zhaoyang.huang, Andrew Morton, Lorenzo Stoakes, Barry Song, Baolin Wang, Lance Yang, Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain, linux-mm, linux-kernel, steve.kang, xiuhong.wang@unisoc.com, hao_hao.wang On Thu, Jun 11, 2026 at 2:44 AM Zi Yan <ziy@nvidia.com> wrote: > > On 10 Jun 2026, at 13:25, Zi Yan wrote: > > > On 10 Jun 2026, at 10:38, Zi Yan wrote: > > > >> On 10 Jun 2026, at 8:50, David Hildenbrand (Arm) wrote: > >> > >>> On 6/10/26 14:05, zhaoyang.huang wrote: > >>>> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com> > >>>> > >>>> The kernel panics are keeping to be reported especially when the f2fs > >>>> partition get almost full. By investigation, we find that the reason is > >>>> one f2fs page got freed to buddy without being deleted from LRU and the > >>>> root cause is the race happened in [2] which is enrolled by this commit. > >>>> We solve this issue by reverting a f2fs commit 9609dd704725 ("f2fs: remove > >>>> non-uptodate folio from the page cache in move_data_block"). > >>> > >>> But I assume, that other FSes can trigger this as well? Any insights? > >>> > >>>> > >>>> There are 3 race processes in this scenario, please find below for their > >>>> main activities. However, by further investigation over the code, I > >>>> think there is a common race window for the truncated folios between > >>>> split_folio_to_order and folio_isolate_lru, where the folios lost the > >>>> refcount on page cache and remains the transient one of the split > >>>> caller, under which the folio could enter free path and compete with the > >>>> isolation process. This commit would like to suggest to have the folios > >>>> beyond EOF stay out of LRU. > >>>> > >>>> Truncate: > >>>> The changed code in move_data_block() lets the GC path evict the tail-end > >>>> folio from the page cache through folio_end_dropbehind(). Once > >>>> folio_unmap_invalidate() removes the folio from mapping->i_pages, the > >>>> page-cache references for all pages in the folio are dropped. The folio > >>>> is then kept alive only by temporary external references, which allows a > >>>> later split to operate on a folio whose subpages are no longer protected > >>>> by page-cache references. > >>>> > >>>> Split: > >>>> After the page-cache references are gone, split_folio_to_order() can > >>>> split the big folio into individual pages and put the resulting subpages > >>>> back on the LRU. For tail pages beyond EOF, split removes them from the > >>>> page cache and drops their page-cache references. A tail page can then > >>>> remain on the LRU with PG_lru set while holding only the split caller's > >>>> temporary reference. When free_folio_and_swap_cache() drops that final > >>>> reference, the page enters the final folio_put() release path. > >>>> > >>>> Isolate: > >>>> In parallel, folio_isolate_lru() can observe the same tail page with a > >>>> non-zero refcount and PG_lru set. It clears PG_lru before taking its own > >>>> reference. If this races with the final folio_put() from the split path, > >>>> __folio_put() sees PG_lru already cleared and skips lruvec_del_folio(). > >>>> The page is then freed back to the allocator while its lru links are > >>>> still present in the LRU list. A later LRU operation on a neighboring > >>>> page detects the stale link and reports list corruption. > > Something is wrong here with the caller of folio_isolate_lru(), since > folio_isolate_lru() requires the caller to take an elevated refcount. > This means when entering folio_isolate_lru(), the EOF folio should have > at least refcount == 2, 1 from folio_split(), 1 from the caller of > folio_isolate_lru(). This should prevent the EOF folio being freed > by the parallel __folio_put(). This is one of the key points for this issue. Could the isolate caller grab the refcount(by folio_get but not folio_try_get) after the spliter's folio_put->folio_put_testzero? If it may, then the panic happens CPU1 (split_folio_to_order) CPU2 (folio_isolate_lru) split_folio_to_order(F) folio_ref_freeze(F, 1) ... lru_add_split_folio(T) list_add_tail(&T->lru, &F->lru) folio_set_lru(T) __filemap_remove_folio(T) folio_put_refs(T, 1) /* T refcount == 1, PageLRU set */ free_folio_and_swap_cache(T) folio_put(T) /* refcount: 1 -> 0 */ //caller grab the refcount here? folio_isolate_lru(T) folio_test_clear_lru(T) __folio_put(T) __page_cache_release(T) folio_test_lru(T) == false /* skip lruvec_del_folio(T) */ free_frozen_pages(T) folio_get(T) lruvec_del_folio(T) > > Hi Zhaoyang, can you elaborate on the folio_isolate_lru() caller? Sorry, no. Split and isolate thing are merely assumption by the phenomenons. > > In addition (with the help of Claude), the race trace[2] below > looks invalid. It says split happens after folio_end_dropbehind(), > which sets folio->mapping to NULL, but __folio_split() returns -EBUSY > when folio->mapping is NULL in filemap_release_folio() check. > So the split cannot happen. Could the folio_needs_release return false? if (!folio_needs_release(folio)) return true; > > Now I am not sure if the bug report is valid or not. At least for > folio_split() and folio_isolate_lru(), the race should not exist. > But let me know if I miss anything. > > >>> > >>> Complicated mess :( > >>> > >>> So, folio_isolate_lru() really only requires the caller to hold a folio > >>> reference, which can happen given that we did the folio_ref_unfreeze(). It can, > >>> for example, be triggered by memory offlining or page migration. > >>> > >>> So we really want to not allow folio_isolate_lru() while we are still processing > >>> the folio. > >> > >> Or we should defer adding split folios to LRU after unfreeze. > >> > >>> > >>> What your patch does is, simply not add folios that we will drop from the page > >>> cache to the LRU? > >>> > >>> > >>> You should describe here how you are fixing it: "Let's fix it by..." > >>> > >>>> > >>>> [1] > >>>> [ 22.486082] list_del corruption. next->prev should be fffffffec10e0ac8, but was dead000000000122. (next=fffffffec10e0a88) > >>>> [ 22.486130] ------------[ cut here ]------------ > >>>> [ 22.486134] kernel BUG at lib/list_debug.c:67! > >>>> [ 22.486141] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP > >>>> [ 22.488502] Tainted: [W]=WARN, [O]=OOT_MODULE > >>>> [ 22.488506] Hardware name: Spreadtrum UMS9230 1H10 SoC (DT) > >>>> [ 22.488511] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > >>>> [ 22.488517] pc : __list_del_entry_valid_or_report+0x14c/0x154 > >>>> [ 22.488531] lr : __list_del_entry_valid_or_report+0x14c/0x154 > >>>> [ 22.488539] sp : ffffffc08006b830 > >>>> [ 22.488542] x29: ffffffc08006b868 x28: 0000000000003020 x27: 0000000000000000 > >>>> [ 22.488553] x26: 0000000000000000 x25: 0000000000000004 x24: fffffffec10e0ac0 > >>>> [ 22.488564] x23: 00000000000000e8 x22: 0000000000000024 x21: dead000000000122 > >>>> [ 22.488574] x20: fffffffec10e0a88 x19: fffffffec10e0ac8 x18: ffffffc080061060 > >>>> [ 22.488585] x17: 20747562202c3863 x16: 6130653031636566 x15: 0000000000000058 > >>>> [ 22.488595] x14: 0000000000000004 x13: ffffff80f91e0000 x12: 0000000000000003 > >>>> [ 22.488605] x11: 0000000000000003 x10: 0000000000000001 x9 : ffe85721f0e25f00 > >>>> [ 22.488615] x8 : ffe85721f0e25f00 x7 : 0000000000000000 x6 : 6c65645f7473696c > >>>> [ 22.488625] x5 : ffffffed39b23026 x4 : 0000000000000000 x3 : 0000000000000010 > >>>> [ 22.488636] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 000000000000006d > >>>> [ 22.488647] Call trace: > >>>> [ 22.488651] __list_del_entry_valid_or_report+0x14c/0x154 (P) > >>>> [ 22.488661] __folio_put+0x2bc/0x434 > >>>> [ 22.488670] folio_put+0x28/0x58 > >>>> [ 22.488678] do_garbage_collect+0x1a34/0x2584 > >>>> [ 22.488689] f2fs_gc+0x230/0x9b4 > >>>> [ 22.488697] f2fs_fallocate+0xb90/0xdf4 > >>>> [ 22.488706] vfs_fallocate+0x1b4/0x2bc > >>>> [ 22.488716] __arm64_sys_fallocate+0x44/0x78 > >>>> [ 22.488725] invoke_syscall+0x58/0xe4 > >>>> [ 22.488732] do_el0_svc+0x48/0xdc > >>>> [ 22.488739] el0_svc+0x3c/0x98 > >>>> [ 22.488747] el0t_64_sync_handler+0x20/0x130 > >>>> [ 22.488754] el0t_64_sync+0x1c4/0x1c8 > >>>> > >>>> [2] > >>>> CPU0 (f2fs GC) CPU1 (split_folio_to_order) CPU2 (folio_isolate_lru) > >>>> > >>>> F: pagecache refs = n > >>>> F: extra refs = GC + split > >>>> F: PG_lru set > >>>> move_data_block() > >>>> folio = f2fs_grab_cache_folio(F) > >>>> ... > >>>> __folio_set_dropbehind(F) > >>>> folio_unlock(F) > >>>> folio_end_dropbehind(F) > >>>> folio_unmap_invalidate(F) > >>>> __filemap_remove_folio(F) > >>>> folio_put_refs(F, n) > >>>> folio_put(F) > >>>> split_folio_to_order(F) > >>>> folio_ref_freeze(F, 1) > >>>> ... > >>>> lru_add_split_folio(T) > >>>> list_add_tail(&T->lru, &F->lru) > >>>> folio_set_lru(T) > >>>> __filemap_remove_folio(T) > >>>> folio_put_refs(T, 1) > >>>> /* T refcount == 1, PageLRU set */ > >>>> free_folio_and_swap_cache(T) > >>>> folio_put(T) > >>>> /* refcount: 1 -> 0 */ > >>>> folio_isolate_lru(T) > >> > >> If refcount is 0 at this point, VM_BUG_ON_FOLIO(!folio_ref_count(folio), folio) in > >> folio_isolate_lru() would be triggered. Maybe we could just return false in that case. > >> > >>>> folio_test_clear_lru(T) > >>>> __folio_put(T) > >>>> __page_cache_release(T) > >>>> folio_test_lru(T) == false > >>>> /* skip lruvec_del_folio(T) */ > >>>> free_frozen_pages(T) > >>>> folio_get(T) > >>>> lruvec_del_folio(T) > >> > >> But in CPU2 (folio_isolate_lru), lruvec_del_folio(T) should remove T from LRU list. > >> > >>>> later: > >>>> list_del(adjacent->lru) > >>>> next == &T->lru > >>>> next->prev == LIST_POISON / PCP freelist > >>>> BUG > >>>> > >> > >> Why does CPU0 still see the stale link from adjacent? > >> > >>>> Assisted-by: Cursor:claude-opus-4-8 > >>>> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> > >>> > >>> I'm wondering if this has been broken the whole time, or if some rework allowed > >>> this to trigger. > >>> > >>> I assume the issue can be triggered for other FSes, and we want Fixes: + CC: stable? > >>> > >>> Looking into the history, I think we always unconditionally did the > >>> lru_add_split_folio()/lru_add_page_tail(). > >>> > >>>> --- > >>>> mm/huge_memory.c | 2 +- > >>>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>>> > >>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c > >>>> index 970e077019b7..7465525a94a8 100644 > >>>> --- a/mm/huge_memory.c > >>>> +++ b/mm/huge_memory.c > >>>> @@ -3966,7 +3966,7 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n > >>>> folio_ref_unfreeze(new_folio, > >>>> folio_cache_ref_count(new_folio) + 1); > >>>> > >>>> - if (do_lru) > >>>> + if (do_lru && !(mapping && new_folio->index >= end)) > >>> > >>> It might be clearer to write this as > >>> > >>> do_lru && (!mapping || new_folio->index < end) > >>> > >>> To match the page-cache check further below > >>> > >>> if (!mapping) > >>> continue > >>> > >>> ... > >>> if (new_folio->index < end) > >>> ... > >>> > >>>> lru_add_split_folio(folio, new_folio, lruvec, list); > > > > Talked to Claude and find an accounting issue with this. Without putting > > EOF after-split folios back to LRU, they are not going through lruvec_del_folio(), > > which decreases NR_*_LRU counter along with removing the folio from LRU > > and it causes NR_*_LRU accounting errors. Note that the original folio > > is on LRU all the time and LRU counters are not modified and after the split > > the original folio size is decreased and the after-split folios need to > > be added back to LRU to keep the LRU counters right. We will need to adjust > > LRU accounting for (!mapping || new_folio->index < end) if we decide to > > not add them back to LRU. > > > > > > Best Regards, > > Yan, Zi > > > Best Regards, > Yan, Zi ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH] mm/huge_memory: do not add dropped split tail folios to LRU 2026-06-11 1:19 ` Zhaoyang Huang @ 2026-06-11 1:49 ` Zi Yan 0 siblings, 0 replies; 16+ messages in thread From: Zi Yan @ 2026-06-11 1:49 UTC (permalink / raw) To: Zhaoyang Huang Cc: David Hildenbrand (Arm), zhaoyang.huang, Andrew Morton, Lorenzo Stoakes, Barry Song, Baolin Wang, Lance Yang, Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain, linux-mm, linux-kernel, steve.kang, xiuhong.wang, hao_hao.wang On 10 Jun 2026, at 21:19, Zhaoyang Huang wrote: > On Thu, Jun 11, 2026 at 2:44 AM Zi Yan <ziy@nvidia.com> wrote: >> >> On 10 Jun 2026, at 13:25, Zi Yan wrote: >> >>> On 10 Jun 2026, at 10:38, Zi Yan wrote: >>> >>>> On 10 Jun 2026, at 8:50, David Hildenbrand (Arm) wrote: >>>> >>>>> On 6/10/26 14:05, zhaoyang.huang wrote: >>>>>> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com> >>>>>> >>>>>> The kernel panics are keeping to be reported especially when the f2fs >>>>>> partition get almost full. By investigation, we find that the reason is >>>>>> one f2fs page got freed to buddy without being deleted from LRU and the >>>>>> root cause is the race happened in [2] which is enrolled by this commit. >>>>>> We solve this issue by reverting a f2fs commit 9609dd704725 ("f2fs: remove >>>>>> non-uptodate folio from the page cache in move_data_block"). >>>>> >>>>> But I assume, that other FSes can trigger this as well? Any insights? >>>>> >>>>>> >>>>>> There are 3 race processes in this scenario, please find below for their >>>>>> main activities. However, by further investigation over the code, I >>>>>> think there is a common race window for the truncated folios between >>>>>> split_folio_to_order and folio_isolate_lru, where the folios lost the >>>>>> refcount on page cache and remains the transient one of the split >>>>>> caller, under which the folio could enter free path and compete with the >>>>>> isolation process. This commit would like to suggest to have the folios >>>>>> beyond EOF stay out of LRU. >>>>>> >>>>>> Truncate: >>>>>> The changed code in move_data_block() lets the GC path evict the tail-end >>>>>> folio from the page cache through folio_end_dropbehind(). Once >>>>>> folio_unmap_invalidate() removes the folio from mapping->i_pages, the >>>>>> page-cache references for all pages in the folio are dropped. The folio >>>>>> is then kept alive only by temporary external references, which allows a >>>>>> later split to operate on a folio whose subpages are no longer protected >>>>>> by page-cache references. >>>>>> >>>>>> Split: >>>>>> After the page-cache references are gone, split_folio_to_order() can >>>>>> split the big folio into individual pages and put the resulting subpages >>>>>> back on the LRU. For tail pages beyond EOF, split removes them from the >>>>>> page cache and drops their page-cache references. A tail page can then >>>>>> remain on the LRU with PG_lru set while holding only the split caller's >>>>>> temporary reference. When free_folio_and_swap_cache() drops that final >>>>>> reference, the page enters the final folio_put() release path. >>>>>> >>>>>> Isolate: >>>>>> In parallel, folio_isolate_lru() can observe the same tail page with a >>>>>> non-zero refcount and PG_lru set. It clears PG_lru before taking its own >>>>>> reference. If this races with the final folio_put() from the split path, >>>>>> __folio_put() sees PG_lru already cleared and skips lruvec_del_folio(). >>>>>> The page is then freed back to the allocator while its lru links are >>>>>> still present in the LRU list. A later LRU operation on a neighboring >>>>>> page detects the stale link and reports list corruption. >> >> Something is wrong here with the caller of folio_isolate_lru(), since >> folio_isolate_lru() requires the caller to take an elevated refcount. >> This means when entering folio_isolate_lru(), the EOF folio should have >> at least refcount == 2, 1 from folio_split(), 1 from the caller of >> folio_isolate_lru(). This should prevent the EOF folio being freed >> by the parallel __folio_put(). > This is one of the key points for this issue. Could the isolate caller > grab the refcount(by folio_get but not folio_try_get) after the > spliter's folio_put->folio_put_testzero? If it may, then the panic > happens > > CPU1 (split_folio_to_order) CPU2 > (folio_isolate_lru) > > split_folio_to_order(F) > folio_ref_freeze(F, 1) > ... > lru_add_split_folio(T) > list_add_tail(&T->lru, &F->lru) > folio_set_lru(T) > __filemap_remove_folio(T) > folio_put_refs(T, 1) > /* T refcount == 1, PageLRU set */ > free_folio_and_swap_cache(T) > folio_put(T) > /* refcount: 1 -> 0 */ > > //caller grab the refcount here? Which caller calls folio_get() instead of folio_try_get()? Claude does not find any caller doing folio_get() + folio_isolate_lru(), except migrate_device_unmap(), which holds a page table lock to make sure the folio has a mapping and non-zero ref. Even with folio_get(), it has VM_BUG_ON_FOLIO(folio_ref_zero_or_close_to_overflow(folio), folio), which prevents caller from elevating 0-refcounted folios, unless your runs did not have DEBUG_VM enabled. > > folio_isolate_lru(T) > > folio_test_clear_lru(T) > __folio_put(T) > __page_cache_release(T) > folio_test_lru(T) == false > /* skip lruvec_del_folio(T) */ > free_frozen_pages(T) > folio_get(T) > > lruvec_del_folio(T) >> >> Hi Zhaoyang, can you elaborate on the folio_isolate_lru() caller? > Sorry, no. Split and isolate thing are merely assumption by the phenomenons. >> >> In addition (with the help of Claude), the race trace[2] below >> looks invalid. It says split happens after folio_end_dropbehind(), >> which sets folio->mapping to NULL, but __folio_split() returns -EBUSY >> when folio->mapping is NULL in filemap_release_folio() check. >> So the split cannot happen. > Could the folio_needs_release return false? Wait, if folio->mapping is NULL and folio is not anonymous, folio_check_splittable() returns false at the beginning of __folio_split(). So the split cannot happen. > > if (!folio_needs_release(folio)) > return true; > >> >> Now I am not sure if the bug report is valid or not. At least for >> folio_split() and folio_isolate_lru(), the race should not exist. >> But let me know if I miss anything. >> >>>>> >>>>> Complicated mess :( >>>>> >>>>> So, folio_isolate_lru() really only requires the caller to hold a folio >>>>> reference, which can happen given that we did the folio_ref_unfreeze(). It can, >>>>> for example, be triggered by memory offlining or page migration. >>>>> >>>>> So we really want to not allow folio_isolate_lru() while we are still processing >>>>> the folio. >>>> >>>> Or we should defer adding split folios to LRU after unfreeze. >>>> >>>>> >>>>> What your patch does is, simply not add folios that we will drop from the page >>>>> cache to the LRU? >>>>> >>>>> >>>>> You should describe here how you are fixing it: "Let's fix it by..." >>>>> >>>>>> >>>>>> [1] >>>>>> [ 22.486082] list_del corruption. next->prev should be fffffffec10e0ac8, but was dead000000000122. (next=fffffffec10e0a88) >>>>>> [ 22.486130] ------------[ cut here ]------------ >>>>>> [ 22.486134] kernel BUG at lib/list_debug.c:67! >>>>>> [ 22.486141] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP >>>>>> [ 22.488502] Tainted: [W]=WARN, [O]=OOT_MODULE >>>>>> [ 22.488506] Hardware name: Spreadtrum UMS9230 1H10 SoC (DT) >>>>>> [ 22.488511] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) >>>>>> [ 22.488517] pc : __list_del_entry_valid_or_report+0x14c/0x154 >>>>>> [ 22.488531] lr : __list_del_entry_valid_or_report+0x14c/0x154 >>>>>> [ 22.488539] sp : ffffffc08006b830 >>>>>> [ 22.488542] x29: ffffffc08006b868 x28: 0000000000003020 x27: 0000000000000000 >>>>>> [ 22.488553] x26: 0000000000000000 x25: 0000000000000004 x24: fffffffec10e0ac0 >>>>>> [ 22.488564] x23: 00000000000000e8 x22: 0000000000000024 x21: dead000000000122 >>>>>> [ 22.488574] x20: fffffffec10e0a88 x19: fffffffec10e0ac8 x18: ffffffc080061060 >>>>>> [ 22.488585] x17: 20747562202c3863 x16: 6130653031636566 x15: 0000000000000058 >>>>>> [ 22.488595] x14: 0000000000000004 x13: ffffff80f91e0000 x12: 0000000000000003 >>>>>> [ 22.488605] x11: 0000000000000003 x10: 0000000000000001 x9 : ffe85721f0e25f00 >>>>>> [ 22.488615] x8 : ffe85721f0e25f00 x7 : 0000000000000000 x6 : 6c65645f7473696c >>>>>> [ 22.488625] x5 : ffffffed39b23026 x4 : 0000000000000000 x3 : 0000000000000010 >>>>>> [ 22.488636] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 000000000000006d >>>>>> [ 22.488647] Call trace: >>>>>> [ 22.488651] __list_del_entry_valid_or_report+0x14c/0x154 (P) >>>>>> [ 22.488661] __folio_put+0x2bc/0x434 >>>>>> [ 22.488670] folio_put+0x28/0x58 >>>>>> [ 22.488678] do_garbage_collect+0x1a34/0x2584 >>>>>> [ 22.488689] f2fs_gc+0x230/0x9b4 >>>>>> [ 22.488697] f2fs_fallocate+0xb90/0xdf4 >>>>>> [ 22.488706] vfs_fallocate+0x1b4/0x2bc >>>>>> [ 22.488716] __arm64_sys_fallocate+0x44/0x78 >>>>>> [ 22.488725] invoke_syscall+0x58/0xe4 >>>>>> [ 22.488732] do_el0_svc+0x48/0xdc >>>>>> [ 22.488739] el0_svc+0x3c/0x98 >>>>>> [ 22.488747] el0t_64_sync_handler+0x20/0x130 >>>>>> [ 22.488754] el0t_64_sync+0x1c4/0x1c8 >>>>>> >>>>>> [2] >>>>>> CPU0 (f2fs GC) CPU1 (split_folio_to_order) CPU2 (folio_isolate_lru) >>>>>> >>>>>> F: pagecache refs = n >>>>>> F: extra refs = GC + split >>>>>> F: PG_lru set >>>>>> move_data_block() >>>>>> folio = f2fs_grab_cache_folio(F) >>>>>> ... >>>>>> __folio_set_dropbehind(F) >>>>>> folio_unlock(F) >>>>>> folio_end_dropbehind(F) >>>>>> folio_unmap_invalidate(F) >>>>>> __filemap_remove_folio(F) >>>>>> folio_put_refs(F, n) >>>>>> folio_put(F) >>>>>> split_folio_to_order(F) >>>>>> folio_ref_freeze(F, 1) >>>>>> ... >>>>>> lru_add_split_folio(T) >>>>>> list_add_tail(&T->lru, &F->lru) >>>>>> folio_set_lru(T) >>>>>> __filemap_remove_folio(T) >>>>>> folio_put_refs(T, 1) >>>>>> /* T refcount == 1, PageLRU set */ >>>>>> free_folio_and_swap_cache(T) >>>>>> folio_put(T) >>>>>> /* refcount: 1 -> 0 */ >>>>>> folio_isolate_lru(T) >>>> >>>> If refcount is 0 at this point, VM_BUG_ON_FOLIO(!folio_ref_count(folio), folio) in >>>> folio_isolate_lru() would be triggered. Maybe we could just return false in that case. >>>> >>>>>> folio_test_clear_lru(T) >>>>>> __folio_put(T) >>>>>> __page_cache_release(T) >>>>>> folio_test_lru(T) == false >>>>>> /* skip lruvec_del_folio(T) */ >>>>>> free_frozen_pages(T) >>>>>> folio_get(T) >>>>>> lruvec_del_folio(T) >>>> >>>> But in CPU2 (folio_isolate_lru), lruvec_del_folio(T) should remove T from LRU list. >>>> >>>>>> later: >>>>>> list_del(adjacent->lru) >>>>>> next == &T->lru >>>>>> next->prev == LIST_POISON / PCP freelist >>>>>> BUG >>>>>> >>>> >>>> Why does CPU0 still see the stale link from adjacent? >>>> >>>>>> Assisted-by: Cursor:claude-opus-4-8 >>>>>> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> >>>>> >>>>> I'm wondering if this has been broken the whole time, or if some rework allowed >>>>> this to trigger. >>>>> >>>>> I assume the issue can be triggered for other FSes, and we want Fixes: + CC: stable? >>>>> >>>>> Looking into the history, I think we always unconditionally did the >>>>> lru_add_split_folio()/lru_add_page_tail(). >>>>> >>>>>> --- >>>>>> mm/huge_memory.c | 2 +- >>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>>>>> index 970e077019b7..7465525a94a8 100644 >>>>>> --- a/mm/huge_memory.c >>>>>> +++ b/mm/huge_memory.c >>>>>> @@ -3966,7 +3966,7 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n >>>>>> folio_ref_unfreeze(new_folio, >>>>>> folio_cache_ref_count(new_folio) + 1); >>>>>> >>>>>> - if (do_lru) >>>>>> + if (do_lru && !(mapping && new_folio->index >= end)) >>>>> >>>>> It might be clearer to write this as >>>>> >>>>> do_lru && (!mapping || new_folio->index < end) >>>>> >>>>> To match the page-cache check further below >>>>> >>>>> if (!mapping) >>>>> continue >>>>> >>>>> ... >>>>> if (new_folio->index < end) >>>>> ... >>>>> >>>>>> lru_add_split_folio(folio, new_folio, lruvec, list); >>> >>> Talked to Claude and find an accounting issue with this. Without putting >>> EOF after-split folios back to LRU, they are not going through lruvec_del_folio(), >>> which decreases NR_*_LRU counter along with removing the folio from LRU >>> and it causes NR_*_LRU accounting errors. Note that the original folio >>> is on LRU all the time and LRU counters are not modified and after the split >>> the original folio size is decreased and the after-split folios need to >>> be added back to LRU to keep the LRU counters right. We will need to adjust >>> LRU accounting for (!mapping || new_folio->index < end) if we decide to >>> not add them back to LRU. >>> >>> >>> Best Regards, >>> Yan, Zi >> >> >> Best Regards, >> Yan, Zi -- Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH] mm/huge_memory: do not add dropped split tail folios to LRU 2026-06-10 14:38 ` Zi Yan 2026-06-10 17:25 ` Zi Yan @ 2026-06-11 1:39 ` Zhaoyang Huang 2026-06-11 1:56 ` Zi Yan 1 sibling, 1 reply; 16+ messages in thread From: Zhaoyang Huang @ 2026-06-11 1:39 UTC (permalink / raw) To: Zi Yan Cc: David Hildenbrand (Arm), zhaoyang.huang, Andrew Morton, Lorenzo Stoakes, Barry Song, Baolin Wang, Lance Yang, Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain, linux-mm, linux-kernel, steve.kang, xiuhong.wang@unisoc.com, hao_hao.wang On Wed, Jun 10, 2026 at 10:38 PM Zi Yan <ziy@nvidia.com> wrote: > > On 10 Jun 2026, at 8:50, David Hildenbrand (Arm) wrote: > > > On 6/10/26 14:05, zhaoyang.huang wrote: > >> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com> > >> > >> The kernel panics are keeping to be reported especially when the f2fs > >> partition get almost full. By investigation, we find that the reason is > >> one f2fs page got freed to buddy without being deleted from LRU and the > >> root cause is the race happened in [2] which is enrolled by this commit. > >> We solve this issue by reverting a f2fs commit 9609dd704725 ("f2fs: remove > >> non-uptodate folio from the page cache in move_data_block"). > > > > But I assume, that other FSes can trigger this as well? Any insights? Yes, I think all FSes support big folio could suffer from this defect. > > > >> > >> There are 3 race processes in this scenario, please find below for their > >> main activities. However, by further investigation over the code, I > >> think there is a common race window for the truncated folios between > >> split_folio_to_order and folio_isolate_lru, where the folios lost the > >> refcount on page cache and remains the transient one of the split > >> caller, under which the folio could enter free path and compete with the > >> isolation process. This commit would like to suggest to have the folios > >> beyond EOF stay out of LRU. > >> > >> Truncate: > >> The changed code in move_data_block() lets the GC path evict the tail-end > >> folio from the page cache through folio_end_dropbehind(). Once > >> folio_unmap_invalidate() removes the folio from mapping->i_pages, the > >> page-cache references for all pages in the folio are dropped. The folio > >> is then kept alive only by temporary external references, which allows a > >> later split to operate on a folio whose subpages are no longer protected > >> by page-cache references. > >> > >> Split: > >> After the page-cache references are gone, split_folio_to_order() can > >> split the big folio into individual pages and put the resulting subpages > >> back on the LRU. For tail pages beyond EOF, split removes them from the > >> page cache and drops their page-cache references. A tail page can then > >> remain on the LRU with PG_lru set while holding only the split caller's > >> temporary reference. When free_folio_and_swap_cache() drops that final > >> reference, the page enters the final folio_put() release path. > >> > >> Isolate: > >> In parallel, folio_isolate_lru() can observe the same tail page with a > >> non-zero refcount and PG_lru set. It clears PG_lru before taking its own > >> reference. If this races with the final folio_put() from the split path, > >> __folio_put() sees PG_lru already cleared and skips lruvec_del_folio(). > >> The page is then freed back to the allocator while its lru links are > >> still present in the LRU list. A later LRU operation on a neighboring > >> page detects the stale link and reports list corruption. > > > > Complicated mess :( > > > > So, folio_isolate_lru() really only requires the caller to hold a folio > > reference, which can happen given that we did the folio_ref_unfreeze(). It can, > > for example, be triggered by memory offlining or page migration. > > > > So we really want to not allow folio_isolate_lru() while we are still processing > > the folio. > > Or we should defer adding split folios to LRU after unfreeze. > > > > > What your patch does is, simply not add folios that we will drop from the page > > cache to the LRU? > > > > > > You should describe here how you are fixing it: "Let's fix it by..." Yes. This commit would like to suggest to fix it by having the folio skip the lru_add_split_folio > > > >> > >> [1] > >> [ 22.486082] list_del corruption. next->prev should be fffffffec10e0ac8, but was dead000000000122. (next=fffffffec10e0a88) > >> [ 22.486130] ------------[ cut here ]------------ > >> [ 22.486134] kernel BUG at lib/list_debug.c:67! > >> [ 22.486141] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP > >> [ 22.488502] Tainted: [W]=WARN, [O]=OOT_MODULE > >> [ 22.488506] Hardware name: Spreadtrum UMS9230 1H10 SoC (DT) > >> [ 22.488511] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > >> [ 22.488517] pc : __list_del_entry_valid_or_report+0x14c/0x154 > >> [ 22.488531] lr : __list_del_entry_valid_or_report+0x14c/0x154 > >> [ 22.488539] sp : ffffffc08006b830 > >> [ 22.488542] x29: ffffffc08006b868 x28: 0000000000003020 x27: 0000000000000000 > >> [ 22.488553] x26: 0000000000000000 x25: 0000000000000004 x24: fffffffec10e0ac0 > >> [ 22.488564] x23: 00000000000000e8 x22: 0000000000000024 x21: dead000000000122 > >> [ 22.488574] x20: fffffffec10e0a88 x19: fffffffec10e0ac8 x18: ffffffc080061060 > >> [ 22.488585] x17: 20747562202c3863 x16: 6130653031636566 x15: 0000000000000058 > >> [ 22.488595] x14: 0000000000000004 x13: ffffff80f91e0000 x12: 0000000000000003 > >> [ 22.488605] x11: 0000000000000003 x10: 0000000000000001 x9 : ffe85721f0e25f00 > >> [ 22.488615] x8 : ffe85721f0e25f00 x7 : 0000000000000000 x6 : 6c65645f7473696c > >> [ 22.488625] x5 : ffffffed39b23026 x4 : 0000000000000000 x3 : 0000000000000010 > >> [ 22.488636] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 000000000000006d > >> [ 22.488647] Call trace: > >> [ 22.488651] __list_del_entry_valid_or_report+0x14c/0x154 (P) > >> [ 22.488661] __folio_put+0x2bc/0x434 > >> [ 22.488670] folio_put+0x28/0x58 > >> [ 22.488678] do_garbage_collect+0x1a34/0x2584 > >> [ 22.488689] f2fs_gc+0x230/0x9b4 > >> [ 22.488697] f2fs_fallocate+0xb90/0xdf4 > >> [ 22.488706] vfs_fallocate+0x1b4/0x2bc > >> [ 22.488716] __arm64_sys_fallocate+0x44/0x78 > >> [ 22.488725] invoke_syscall+0x58/0xe4 > >> [ 22.488732] do_el0_svc+0x48/0xdc > >> [ 22.488739] el0_svc+0x3c/0x98 > >> [ 22.488747] el0t_64_sync_handler+0x20/0x130 > >> [ 22.488754] el0t_64_sync+0x1c4/0x1c8 > >> > >> [2] > >> CPU0 (f2fs GC) CPU1 (split_folio_to_order) CPU2 (folio_isolate_lru) > >> > >> F: pagecache refs = n > >> F: extra refs = GC + split > >> F: PG_lru set > >> move_data_block() > >> folio = f2fs_grab_cache_folio(F) > >> ... > >> __folio_set_dropbehind(F) > >> folio_unlock(F) > >> folio_end_dropbehind(F) > >> folio_unmap_invalidate(F) > >> __filemap_remove_folio(F) > >> folio_put_refs(F, n) > >> folio_put(F) > >> split_folio_to_order(F) > >> folio_ref_freeze(F, 1) > >> ... > >> lru_add_split_folio(T) > >> list_add_tail(&T->lru, &F->lru) > >> folio_set_lru(T) > >> __filemap_remove_folio(T) > >> folio_put_refs(T, 1) > >> /* T refcount == 1, PageLRU set */ > >> free_folio_and_swap_cache(T) > >> folio_put(T) > >> /* refcount: 1 -> 0 */ > >> folio_isolate_lru(T) > > If refcount is 0 at this point, VM_BUG_ON_FOLIO(!folio_ref_count(folio), folio) in > folio_isolate_lru() would be triggered. Maybe we could just return false in that case. No, isolate caller will grab one refcount. > > >> folio_test_clear_lru(T) > >> __folio_put(T) > >> __page_cache_release(T) > >> folio_test_lru(T) == false > >> /* skip lruvec_del_folio(T) */ > >> free_frozen_pages(T) > >> folio_get(T) > >> lruvec_del_folio(T) > > But in CPU2 (folio_isolate_lru), lruvec_del_folio(T) should remove T from LRU list. > > >> later: > >> list_del(adjacent->lru) > >> next == &T->lru > >> next->prev == LIST_POISON / PCP freelist > >> BUG > >> > > Why does CPU0 still see the stale link from adjacent? The staled link should be from LRU since the folio never be deleted from lru. > > >> Assisted-by: Cursor:claude-opus-4-8 > >> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> > > > > I'm wondering if this has been broken the whole time, or if some rework allowed > > this to trigger. This issue is from AOSP with v6.18 which just supports big folio in f2fs. Besides, it is triggered by the timing of f2fs's partition get almost full during the test case of filling f2fs's partition(should be the trigger factor of f2fs's gc which enroll truncate thing) > > > > I assume the issue can be triggered for other FSes, and we want Fixes: + CC: stable? > > > > Looking into the history, I think we always unconditionally did the > > lru_add_split_folio()/lru_add_page_tail(). > > > >> --- > >> mm/huge_memory.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c > >> index 970e077019b7..7465525a94a8 100644 > >> --- a/mm/huge_memory.c > >> +++ b/mm/huge_memory.c > >> @@ -3966,7 +3966,7 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n > >> folio_ref_unfreeze(new_folio, > >> folio_cache_ref_count(new_folio) + 1); > >> > >> - if (do_lru) > >> + if (do_lru && !(mapping && new_folio->index >= end)) > > > > It might be clearer to write this as > > > > do_lru && (!mapping || new_folio->index < end) > > > > To match the page-cache check further below > > > > if (!mapping) > > continue > > > > ... > > if (new_folio->index < end) > > ... > > > >> lru_add_split_folio(folio, new_folio, lruvec, list); > >> > >> /* > > > > folio_check_splittable() makes sure that we have a mapping for non-anon folios. > > (no truncation). end is then only set for non-anon folios. > > > > @Zi, any thoughts? > > The fix works but I feel that it is masking the race between folio_isolate_lru() and > folio_put(). I worry that the same issue might be triggered in other ways or > in new code if we do not fix the race. > > To summarize my thoughts above: > 1. adding frozen folios in LRU might be problematic, since folio_isolate_lru() > has a VM_BUG_ON_FOLIO() for it but still chooses to proceed the isolation. > > 2. the race analysis is not clear, since both folio_isolate_lru() and folio_put() > do lruvec_del_folio() if folio is on LRU. When list_del(adjacent->lru) sees > the stale link, the folio is already in buddy and page->lru is modified for > PageBuddy use? So even without CPU0, folio_isolate_lru()'s lruvec_del_folio() > can do the wrong thing on pages on buddy? > > > -- > Best Regards, > Yan, Zi ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH] mm/huge_memory: do not add dropped split tail folios to LRU 2026-06-11 1:39 ` Zhaoyang Huang @ 2026-06-11 1:56 ` Zi Yan 2026-06-11 2:39 ` Zhaoyang Huang 0 siblings, 1 reply; 16+ messages in thread From: Zi Yan @ 2026-06-11 1:56 UTC (permalink / raw) To: Zhaoyang Huang Cc: David Hildenbrand (Arm), zhaoyang.huang, Andrew Morton, Lorenzo Stoakes, Barry Song, Baolin Wang, Lance Yang, Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain, linux-mm, linux-kernel, steve.kang, xiuhong.wang, hao_hao.wang On 10 Jun 2026, at 21:39, Zhaoyang Huang wrote: > On Wed, Jun 10, 2026 at 10:38 PM Zi Yan <ziy@nvidia.com> wrote: >> >> On 10 Jun 2026, at 8:50, David Hildenbrand (Arm) wrote: >> >>> On 6/10/26 14:05, zhaoyang.huang wrote: >>>> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com> >>>> >>>> The kernel panics are keeping to be reported especially when the f2fs >>>> partition get almost full. By investigation, we find that the reason is >>>> one f2fs page got freed to buddy without being deleted from LRU and the >>>> root cause is the race happened in [2] which is enrolled by this commit. >>>> We solve this issue by reverting a f2fs commit 9609dd704725 ("f2fs: remove >>>> non-uptodate folio from the page cache in move_data_block"). >>> >>> But I assume, that other FSes can trigger this as well? Any insights? > > Yes, I think all FSes support big folio could suffer from this defect. > >>> >>>> >>>> There are 3 race processes in this scenario, please find below for their >>>> main activities. However, by further investigation over the code, I >>>> think there is a common race window for the truncated folios between >>>> split_folio_to_order and folio_isolate_lru, where the folios lost the >>>> refcount on page cache and remains the transient one of the split >>>> caller, under which the folio could enter free path and compete with the >>>> isolation process. This commit would like to suggest to have the folios >>>> beyond EOF stay out of LRU. >>>> >>>> Truncate: >>>> The changed code in move_data_block() lets the GC path evict the tail-end >>>> folio from the page cache through folio_end_dropbehind(). Once >>>> folio_unmap_invalidate() removes the folio from mapping->i_pages, the >>>> page-cache references for all pages in the folio are dropped. The folio >>>> is then kept alive only by temporary external references, which allows a >>>> later split to operate on a folio whose subpages are no longer protected >>>> by page-cache references. >>>> >>>> Split: >>>> After the page-cache references are gone, split_folio_to_order() can >>>> split the big folio into individual pages and put the resulting subpages >>>> back on the LRU. For tail pages beyond EOF, split removes them from the >>>> page cache and drops their page-cache references. A tail page can then >>>> remain on the LRU with PG_lru set while holding only the split caller's >>>> temporary reference. When free_folio_and_swap_cache() drops that final >>>> reference, the page enters the final folio_put() release path. >>>> >>>> Isolate: >>>> In parallel, folio_isolate_lru() can observe the same tail page with a >>>> non-zero refcount and PG_lru set. It clears PG_lru before taking its own >>>> reference. If this races with the final folio_put() from the split path, >>>> __folio_put() sees PG_lru already cleared and skips lruvec_del_folio(). >>>> The page is then freed back to the allocator while its lru links are >>>> still present in the LRU list. A later LRU operation on a neighboring >>>> page detects the stale link and reports list corruption. >>> >>> Complicated mess :( >>> >>> So, folio_isolate_lru() really only requires the caller to hold a folio >>> reference, which can happen given that we did the folio_ref_unfreeze(). It can, >>> for example, be triggered by memory offlining or page migration. >>> >>> So we really want to not allow folio_isolate_lru() while we are still processing >>> the folio. >> >> Or we should defer adding split folios to LRU after unfreeze. >> >>> >>> What your patch does is, simply not add folios that we will drop from the page >>> cache to the LRU? >>> >>> >>> You should describe here how you are fixing it: "Let's fix it by..." > Yes. This commit would like to suggest to fix it by having the folio > skip the lru_add_split_folio Skipping it causes more issues like LRU counter mismatch, firing up bad_page() since PG_active, PG_unevictable, or MGLRU fields in ->flags.f could stay uncleared at page free time. >>> >>>> >>>> [1] >>>> [ 22.486082] list_del corruption. next->prev should be fffffffec10e0ac8, but was dead000000000122. (next=fffffffec10e0a88) >>>> [ 22.486130] ------------[ cut here ]------------ >>>> [ 22.486134] kernel BUG at lib/list_debug.c:67! >>>> [ 22.486141] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP >>>> [ 22.488502] Tainted: [W]=WARN, [O]=OOT_MODULE >>>> [ 22.488506] Hardware name: Spreadtrum UMS9230 1H10 SoC (DT) >>>> [ 22.488511] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) >>>> [ 22.488517] pc : __list_del_entry_valid_or_report+0x14c/0x154 >>>> [ 22.488531] lr : __list_del_entry_valid_or_report+0x14c/0x154 >>>> [ 22.488539] sp : ffffffc08006b830 >>>> [ 22.488542] x29: ffffffc08006b868 x28: 0000000000003020 x27: 0000000000000000 >>>> [ 22.488553] x26: 0000000000000000 x25: 0000000000000004 x24: fffffffec10e0ac0 >>>> [ 22.488564] x23: 00000000000000e8 x22: 0000000000000024 x21: dead000000000122 >>>> [ 22.488574] x20: fffffffec10e0a88 x19: fffffffec10e0ac8 x18: ffffffc080061060 >>>> [ 22.488585] x17: 20747562202c3863 x16: 6130653031636566 x15: 0000000000000058 >>>> [ 22.488595] x14: 0000000000000004 x13: ffffff80f91e0000 x12: 0000000000000003 >>>> [ 22.488605] x11: 0000000000000003 x10: 0000000000000001 x9 : ffe85721f0e25f00 >>>> [ 22.488615] x8 : ffe85721f0e25f00 x7 : 0000000000000000 x6 : 6c65645f7473696c >>>> [ 22.488625] x5 : ffffffed39b23026 x4 : 0000000000000000 x3 : 0000000000000010 >>>> [ 22.488636] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 000000000000006d >>>> [ 22.488647] Call trace: >>>> [ 22.488651] __list_del_entry_valid_or_report+0x14c/0x154 (P) >>>> [ 22.488661] __folio_put+0x2bc/0x434 >>>> [ 22.488670] folio_put+0x28/0x58 >>>> [ 22.488678] do_garbage_collect+0x1a34/0x2584 >>>> [ 22.488689] f2fs_gc+0x230/0x9b4 >>>> [ 22.488697] f2fs_fallocate+0xb90/0xdf4 >>>> [ 22.488706] vfs_fallocate+0x1b4/0x2bc >>>> [ 22.488716] __arm64_sys_fallocate+0x44/0x78 >>>> [ 22.488725] invoke_syscall+0x58/0xe4 >>>> [ 22.488732] do_el0_svc+0x48/0xdc >>>> [ 22.488739] el0_svc+0x3c/0x98 >>>> [ 22.488747] el0t_64_sync_handler+0x20/0x130 >>>> [ 22.488754] el0t_64_sync+0x1c4/0x1c8 >>>> >>>> [2] >>>> CPU0 (f2fs GC) CPU1 (split_folio_to_order) CPU2 (folio_isolate_lru) >>>> >>>> F: pagecache refs = n >>>> F: extra refs = GC + split >>>> F: PG_lru set >>>> move_data_block() >>>> folio = f2fs_grab_cache_folio(F) >>>> ... >>>> __folio_set_dropbehind(F) >>>> folio_unlock(F) >>>> folio_end_dropbehind(F) >>>> folio_unmap_invalidate(F) >>>> __filemap_remove_folio(F) >>>> folio_put_refs(F, n) >>>> folio_put(F) >>>> split_folio_to_order(F) >>>> folio_ref_freeze(F, 1) >>>> ... >>>> lru_add_split_folio(T) >>>> list_add_tail(&T->lru, &F->lru) >>>> folio_set_lru(T) >>>> __filemap_remove_folio(T) >>>> folio_put_refs(T, 1) >>>> /* T refcount == 1, PageLRU set */ >>>> free_folio_and_swap_cache(T) >>>> folio_put(T) >>>> /* refcount: 1 -> 0 */ >>>> folio_isolate_lru(T) >> >> If refcount is 0 at this point, VM_BUG_ON_FOLIO(!folio_ref_count(folio), folio) in >> folio_isolate_lru() would be triggered. Maybe we could just return false in that case. > No, isolate caller will grab one refcount. As I said in another email, isolate caller cannot grab a refcount when folio refcount is 0. >> >>>> folio_test_clear_lru(T) >>>> __folio_put(T) >>>> __page_cache_release(T) >>>> folio_test_lru(T) == false >>>> /* skip lruvec_del_folio(T) */ >>>> free_frozen_pages(T) >>>> folio_get(T) >>>> lruvec_del_folio(T) >> >> But in CPU2 (folio_isolate_lru), lruvec_del_folio(T) should remove T from LRU list. >> >>>> later: >>>> list_del(adjacent->lru) >>>> next == &T->lru >>>> next->prev == LIST_POISON / PCP freelist >>>> BUG >>>> >> >> Why does CPU0 still see the stale link from adjacent? > The staled link should be from LRU since the folio never be deleted from lru. >> >>>> Assisted-by: Cursor:claude-opus-4-8 >>>> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> >>> >>> I'm wondering if this has been broken the whole time, or if some rework allowed >>> this to trigger. > This issue is from AOSP with v6.18 which just supports big folio in > f2fs. Besides, it is triggered by the timing of f2fs's partition get > almost full during the test case of filling f2fs's partition(should be > the trigger factor of f2fs's gc which enroll truncate thing) Are you able to reproduce it with other FSes supporting large folio? >>> >>> I assume the issue can be triggered for other FSes, and we want Fixes: + CC: stable? >>> >>> Looking into the history, I think we always unconditionally did the >>> lru_add_split_folio()/lru_add_page_tail(). >>> >>>> --- >>>> mm/huge_memory.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>>> index 970e077019b7..7465525a94a8 100644 >>>> --- a/mm/huge_memory.c >>>> +++ b/mm/huge_memory.c >>>> @@ -3966,7 +3966,7 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n >>>> folio_ref_unfreeze(new_folio, >>>> folio_cache_ref_count(new_folio) + 1); >>>> >>>> - if (do_lru) >>>> + if (do_lru && !(mapping && new_folio->index >= end)) >>> >>> It might be clearer to write this as >>> >>> do_lru && (!mapping || new_folio->index < end) >>> >>> To match the page-cache check further below >>> >>> if (!mapping) >>> continue >>> >>> ... >>> if (new_folio->index < end) >>> ... >>> >>>> lru_add_split_folio(folio, new_folio, lruvec, list); >>>> >>>> /* >>> >>> folio_check_splittable() makes sure that we have a mapping for non-anon folios. >>> (no truncation). end is then only set for non-anon folios. >>> >>> @Zi, any thoughts? >> >> The fix works but I feel that it is masking the race between folio_isolate_lru() and >> folio_put(). I worry that the same issue might be triggered in other ways or >> in new code if we do not fix the race. >> >> To summarize my thoughts above: >> 1. adding frozen folios in LRU might be problematic, since folio_isolate_lru() >> has a VM_BUG_ON_FOLIO() for it but still chooses to proceed the isolation. >> >> 2. the race analysis is not clear, since both folio_isolate_lru() and folio_put() >> do lruvec_del_folio() if folio is on LRU. When list_del(adjacent->lru) sees >> the stale link, the folio is already in buddy and page->lru is modified for >> PageBuddy use? So even without CPU0, folio_isolate_lru()'s lruvec_del_folio() >> can do the wrong thing on pages on buddy? >> >> >> -- >> Best Regards, >> Yan, Zi -- Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH] mm/huge_memory: do not add dropped split tail folios to LRU 2026-06-11 1:56 ` Zi Yan @ 2026-06-11 2:39 ` Zhaoyang Huang 2026-06-11 3:06 ` Zi Yan 0 siblings, 1 reply; 16+ messages in thread From: Zhaoyang Huang @ 2026-06-11 2:39 UTC (permalink / raw) To: Zi Yan Cc: David Hildenbrand (Arm), zhaoyang.huang, Andrew Morton, Lorenzo Stoakes, Barry Song, Baolin Wang, Lance Yang, Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain, linux-mm, linux-kernel, steve.kang, xiuhong.wang, hao_hao.wang, jyescas@google.com On Thu, Jun 11, 2026 at 9:56 AM Zi Yan <ziy@nvidia.com> wrote: > > On 10 Jun 2026, at 21:39, Zhaoyang Huang wrote: > > > On Wed, Jun 10, 2026 at 10:38 PM Zi Yan <ziy@nvidia.com> wrote: > >> > >> On 10 Jun 2026, at 8:50, David Hildenbrand (Arm) wrote: > >> > >>> On 6/10/26 14:05, zhaoyang.huang wrote: > >>>> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com> > >>>> > >>>> The kernel panics are keeping to be reported especially when the f2fs > >>>> partition get almost full. By investigation, we find that the reason is > >>>> one f2fs page got freed to buddy without being deleted from LRU and the > >>>> root cause is the race happened in [2] which is enrolled by this commit. > >>>> We solve this issue by reverting a f2fs commit 9609dd704725 ("f2fs: remove > >>>> non-uptodate folio from the page cache in move_data_block"). > >>> > >>> But I assume, that other FSes can trigger this as well? Any insights? > > > > Yes, I think all FSes support big folio could suffer from this defect. > > > >>> > >>>> > >>>> There are 3 race processes in this scenario, please find below for their > >>>> main activities. However, by further investigation over the code, I > >>>> think there is a common race window for the truncated folios between > >>>> split_folio_to_order and folio_isolate_lru, where the folios lost the > >>>> refcount on page cache and remains the transient one of the split > >>>> caller, under which the folio could enter free path and compete with the > >>>> isolation process. This commit would like to suggest to have the folios > >>>> beyond EOF stay out of LRU. > >>>> > >>>> Truncate: > >>>> The changed code in move_data_block() lets the GC path evict the tail-end > >>>> folio from the page cache through folio_end_dropbehind(). Once > >>>> folio_unmap_invalidate() removes the folio from mapping->i_pages, the > >>>> page-cache references for all pages in the folio are dropped. The folio > >>>> is then kept alive only by temporary external references, which allows a > >>>> later split to operate on a folio whose subpages are no longer protected > >>>> by page-cache references. > >>>> > >>>> Split: > >>>> After the page-cache references are gone, split_folio_to_order() can > >>>> split the big folio into individual pages and put the resulting subpages > >>>> back on the LRU. For tail pages beyond EOF, split removes them from the > >>>> page cache and drops their page-cache references. A tail page can then > >>>> remain on the LRU with PG_lru set while holding only the split caller's > >>>> temporary reference. When free_folio_and_swap_cache() drops that final > >>>> reference, the page enters the final folio_put() release path. > >>>> > >>>> Isolate: > >>>> In parallel, folio_isolate_lru() can observe the same tail page with a > >>>> non-zero refcount and PG_lru set. It clears PG_lru before taking its own > >>>> reference. If this races with the final folio_put() from the split path, > >>>> __folio_put() sees PG_lru already cleared and skips lruvec_del_folio(). > >>>> The page is then freed back to the allocator while its lru links are > >>>> still present in the LRU list. A later LRU operation on a neighboring > >>>> page detects the stale link and reports list corruption. > >>> > >>> Complicated mess :( > >>> > >>> So, folio_isolate_lru() really only requires the caller to hold a folio > >>> reference, which can happen given that we did the folio_ref_unfreeze(). It can, > >>> for example, be triggered by memory offlining or page migration. > >>> > >>> So we really want to not allow folio_isolate_lru() while we are still processing > >>> the folio. > >> > >> Or we should defer adding split folios to LRU after unfreeze. > >> > >>> > >>> What your patch does is, simply not add folios that we will drop from the page > >>> cache to the LRU? > >>> > >>> > >>> You should describe here how you are fixing it: "Let's fix it by..." > > Yes. This commit would like to suggest to fix it by having the folio > > skip the lru_add_split_folio > > Skipping it causes more issues like LRU counter mismatch, firing up bad_page() > since PG_active, PG_unevictable, or MGLRU fields in ->flags.f could stay > uncleared at page free time. OK, we should solve this issue. > > >>> > >>>> > >>>> [1] > >>>> [ 22.486082] list_del corruption. next->prev should be fffffffec10e0ac8, but was dead000000000122. (next=fffffffec10e0a88) > >>>> [ 22.486130] ------------[ cut here ]------------ > >>>> [ 22.486134] kernel BUG at lib/list_debug.c:67! > >>>> [ 22.486141] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP > >>>> [ 22.488502] Tainted: [W]=WARN, [O]=OOT_MODULE > >>>> [ 22.488506] Hardware name: Spreadtrum UMS9230 1H10 SoC (DT) > >>>> [ 22.488511] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > >>>> [ 22.488517] pc : __list_del_entry_valid_or_report+0x14c/0x154 > >>>> [ 22.488531] lr : __list_del_entry_valid_or_report+0x14c/0x154 > >>>> [ 22.488539] sp : ffffffc08006b830 > >>>> [ 22.488542] x29: ffffffc08006b868 x28: 0000000000003020 x27: 0000000000000000 > >>>> [ 22.488553] x26: 0000000000000000 x25: 0000000000000004 x24: fffffffec10e0ac0 > >>>> [ 22.488564] x23: 00000000000000e8 x22: 0000000000000024 x21: dead000000000122 > >>>> [ 22.488574] x20: fffffffec10e0a88 x19: fffffffec10e0ac8 x18: ffffffc080061060 > >>>> [ 22.488585] x17: 20747562202c3863 x16: 6130653031636566 x15: 0000000000000058 > >>>> [ 22.488595] x14: 0000000000000004 x13: ffffff80f91e0000 x12: 0000000000000003 > >>>> [ 22.488605] x11: 0000000000000003 x10: 0000000000000001 x9 : ffe85721f0e25f00 > >>>> [ 22.488615] x8 : ffe85721f0e25f00 x7 : 0000000000000000 x6 : 6c65645f7473696c > >>>> [ 22.488625] x5 : ffffffed39b23026 x4 : 0000000000000000 x3 : 0000000000000010 > >>>> [ 22.488636] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 000000000000006d > >>>> [ 22.488647] Call trace: > >>>> [ 22.488651] __list_del_entry_valid_or_report+0x14c/0x154 (P) > >>>> [ 22.488661] __folio_put+0x2bc/0x434 > >>>> [ 22.488670] folio_put+0x28/0x58 > >>>> [ 22.488678] do_garbage_collect+0x1a34/0x2584 > >>>> [ 22.488689] f2fs_gc+0x230/0x9b4 > >>>> [ 22.488697] f2fs_fallocate+0xb90/0xdf4 > >>>> [ 22.488706] vfs_fallocate+0x1b4/0x2bc > >>>> [ 22.488716] __arm64_sys_fallocate+0x44/0x78 > >>>> [ 22.488725] invoke_syscall+0x58/0xe4 > >>>> [ 22.488732] do_el0_svc+0x48/0xdc > >>>> [ 22.488739] el0_svc+0x3c/0x98 > >>>> [ 22.488747] el0t_64_sync_handler+0x20/0x130 > >>>> [ 22.488754] el0t_64_sync+0x1c4/0x1c8 > >>>> > >>>> [2] > >>>> CPU0 (f2fs GC) CPU1 (split_folio_to_order) CPU2 (folio_isolate_lru) > >>>> > >>>> F: pagecache refs = n > >>>> F: extra refs = GC + split > >>>> F: PG_lru set > >>>> move_data_block() > >>>> folio = f2fs_grab_cache_folio(F) > >>>> ... > >>>> __folio_set_dropbehind(F) > >>>> folio_unlock(F) > >>>> folio_end_dropbehind(F) > >>>> folio_unmap_invalidate(F) > >>>> __filemap_remove_folio(F) > >>>> folio_put_refs(F, n) > >>>> folio_put(F) > >>>> split_folio_to_order(F) > >>>> folio_ref_freeze(F, 1) > >>>> ... > >>>> lru_add_split_folio(T) > >>>> list_add_tail(&T->lru, &F->lru) > >>>> folio_set_lru(T) > >>>> __filemap_remove_folio(T) > >>>> folio_put_refs(T, 1) > >>>> /* T refcount == 1, PageLRU set */ > >>>> free_folio_and_swap_cache(T) > >>>> folio_put(T) > >>>> /* refcount: 1 -> 0 */ > >>>> folio_isolate_lru(T) > >> > >> If refcount is 0 at this point, VM_BUG_ON_FOLIO(!folio_ref_count(folio), folio) in > >> folio_isolate_lru() would be triggered. Maybe we could just return false in that case. > > No, isolate caller will grab one refcount. > > As I said in another email, isolate caller cannot grab a refcount when folio refcount > is 0. pin_user_pages*(..., FOLL_LONGTERM) └─ __gup_longterm_locked() [gup.c:2465] │ ├─ follow_page_pte() [gup.c:802] │ │ └─ try_grab_folio() [gup.c:858] if (WARN_ON_ONCE(folio_ref_count(folio) <= 0)) return -ENOMEM; // Could __folio_split->folio_put could race here ? if (flags & FOLL_GET) folio_ref_add(folio, refs); └─ check_and_migrate_movable_pages() [gup.c:2490] └─ collect_longterm_unpinnable_folios() [gup.c:2391] └─ └─if (!folio_isolate_lru(folio)) Could the __folio_split race in the above scenario? It looks like try_grab_folio set the refcount without using atomic operation. >(from previous mail) > Wait, if folio->mapping is NULL and folio is not anonymous, > folio_check_splittable() returns false at the beginning of > __folio_split(). So the split cannot happen. According to my understanding, the folio checked here is still big folio which is locked and with folio->mapping set, right? > > >> > >>>> folio_test_clear_lru(T) > >>>> __folio_put(T) > >>>> __page_cache_release(T) > >>>> folio_test_lru(T) == false > >>>> /* skip lruvec_del_folio(T) */ > >>>> free_frozen_pages(T) > >>>> folio_get(T) > >>>> lruvec_del_folio(T) > >> > >> But in CPU2 (folio_isolate_lru), lruvec_del_folio(T) should remove T from LRU list. > >> > >>>> later: > >>>> list_del(adjacent->lru) > >>>> next == &T->lru > >>>> next->prev == LIST_POISON / PCP freelist > >>>> BUG > >>>> > >> > >> Why does CPU0 still see the stale link from adjacent? > > The staled link should be from LRU since the folio never be deleted from lru. > >> > >>>> Assisted-by: Cursor:claude-opus-4-8 > >>>> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> > >>> > >>> I'm wondering if this has been broken the whole time, or if some rework allowed > >>> this to trigger. > > This issue is from AOSP with v6.18 which just supports big folio in > > f2fs. Besides, it is triggered by the timing of f2fs's partition get > > almost full during the test case of filling f2fs's partition(should be > > the trigger factor of f2fs's gc which enroll truncate thing) > > Are you able to reproduce it with other FSes supporting large folio? Sorry, I can't so far since only f2fs has gc in the Android system. > > >>> > >>> I assume the issue can be triggered for other FSes, and we want Fixes: + CC: stable? > >>> > >>> Looking into the history, I think we always unconditionally did the > >>> lru_add_split_folio()/lru_add_page_tail(). > >>> > >>>> --- > >>>> mm/huge_memory.c | 2 +- > >>>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>>> > >>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c > >>>> index 970e077019b7..7465525a94a8 100644 > >>>> --- a/mm/huge_memory.c > >>>> +++ b/mm/huge_memory.c > >>>> @@ -3966,7 +3966,7 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n > >>>> folio_ref_unfreeze(new_folio, > >>>> folio_cache_ref_count(new_folio) + 1); > >>>> > >>>> - if (do_lru) > >>>> + if (do_lru && !(mapping && new_folio->index >= end)) > >>> > >>> It might be clearer to write this as > >>> > >>> do_lru && (!mapping || new_folio->index < end) > >>> > >>> To match the page-cache check further below > >>> > >>> if (!mapping) > >>> continue > >>> > >>> ... > >>> if (new_folio->index < end) > >>> ... > >>> > >>>> lru_add_split_folio(folio, new_folio, lruvec, list); > >>>> > >>>> /* > >>> > >>> folio_check_splittable() makes sure that we have a mapping for non-anon folios. > >>> (no truncation). end is then only set for non-anon folios. > >>> > >>> @Zi, any thoughts? > >> > >> The fix works but I feel that it is masking the race between folio_isolate_lru() and > >> folio_put(). I worry that the same issue might be triggered in other ways or > >> in new code if we do not fix the race. > >> > >> To summarize my thoughts above: > >> 1. adding frozen folios in LRU might be problematic, since folio_isolate_lru() > >> has a VM_BUG_ON_FOLIO() for it but still chooses to proceed the isolation. > >> > >> 2. the race analysis is not clear, since both folio_isolate_lru() and folio_put() > >> do lruvec_del_folio() if folio is on LRU. When list_del(adjacent->lru) sees > >> the stale link, the folio is already in buddy and page->lru is modified for > >> PageBuddy use? So even without CPU0, folio_isolate_lru()'s lruvec_del_folio() > >> can do the wrong thing on pages on buddy? > >> > >> > >> -- > >> Best Regards, > >> Yan, Zi > > > -- > Best Regards, > Yan, Zi ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH] mm/huge_memory: do not add dropped split tail folios to LRU 2026-06-11 2:39 ` Zhaoyang Huang @ 2026-06-11 3:06 ` Zi Yan 2026-06-11 7:45 ` Zhaoyang Huang 0 siblings, 1 reply; 16+ messages in thread From: Zi Yan @ 2026-06-11 3:06 UTC (permalink / raw) To: Zhaoyang Huang Cc: David Hildenbrand (Arm), zhaoyang.huang, Andrew Morton, Lorenzo Stoakes, Barry Song, Baolin Wang, Lance Yang, Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain, linux-mm, linux-kernel, steve.kang, xiuhong.wang, hao_hao.wang, jyescas On 10 Jun 2026, at 22:39, Zhaoyang Huang wrote: > On Thu, Jun 11, 2026 at 9:56 AM Zi Yan <ziy@nvidia.com> wrote: >> >> On 10 Jun 2026, at 21:39, Zhaoyang Huang wrote: >> >>> On Wed, Jun 10, 2026 at 10:38 PM Zi Yan <ziy@nvidia.com> wrote: >>>> >>>> On 10 Jun 2026, at 8:50, David Hildenbrand (Arm) wrote: >>>> >>>>> On 6/10/26 14:05, zhaoyang.huang wrote: >>>>>> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com> >>>>>> >>>>>> The kernel panics are keeping to be reported especially when the f2fs >>>>>> partition get almost full. By investigation, we find that the reason is >>>>>> one f2fs page got freed to buddy without being deleted from LRU and the >>>>>> root cause is the race happened in [2] which is enrolled by this commit. >>>>>> We solve this issue by reverting a f2fs commit 9609dd704725 ("f2fs: remove >>>>>> non-uptodate folio from the page cache in move_data_block"). >>>>> >>>>> But I assume, that other FSes can trigger this as well? Any insights? >>> >>> Yes, I think all FSes support big folio could suffer from this defect. >>> >>>>> >>>>>> >>>>>> There are 3 race processes in this scenario, please find below for their >>>>>> main activities. However, by further investigation over the code, I >>>>>> think there is a common race window for the truncated folios between >>>>>> split_folio_to_order and folio_isolate_lru, where the folios lost the >>>>>> refcount on page cache and remains the transient one of the split >>>>>> caller, under which the folio could enter free path and compete with the >>>>>> isolation process. This commit would like to suggest to have the folios >>>>>> beyond EOF stay out of LRU. >>>>>> >>>>>> Truncate: >>>>>> The changed code in move_data_block() lets the GC path evict the tail-end >>>>>> folio from the page cache through folio_end_dropbehind(). Once >>>>>> folio_unmap_invalidate() removes the folio from mapping->i_pages, the >>>>>> page-cache references for all pages in the folio are dropped. The folio >>>>>> is then kept alive only by temporary external references, which allows a >>>>>> later split to operate on a folio whose subpages are no longer protected >>>>>> by page-cache references. >>>>>> >>>>>> Split: >>>>>> After the page-cache references are gone, split_folio_to_order() can >>>>>> split the big folio into individual pages and put the resulting subpages >>>>>> back on the LRU. For tail pages beyond EOF, split removes them from the >>>>>> page cache and drops their page-cache references. A tail page can then >>>>>> remain on the LRU with PG_lru set while holding only the split caller's >>>>>> temporary reference. When free_folio_and_swap_cache() drops that final >>>>>> reference, the page enters the final folio_put() release path. >>>>>> >>>>>> Isolate: >>>>>> In parallel, folio_isolate_lru() can observe the same tail page with a >>>>>> non-zero refcount and PG_lru set. It clears PG_lru before taking its own >>>>>> reference. If this races with the final folio_put() from the split path, >>>>>> __folio_put() sees PG_lru already cleared and skips lruvec_del_folio(). >>>>>> The page is then freed back to the allocator while its lru links are >>>>>> still present in the LRU list. A later LRU operation on a neighboring >>>>>> page detects the stale link and reports list corruption. >>>>> >>>>> Complicated mess :( >>>>> >>>>> So, folio_isolate_lru() really only requires the caller to hold a folio >>>>> reference, which can happen given that we did the folio_ref_unfreeze(). It can, >>>>> for example, be triggered by memory offlining or page migration. >>>>> >>>>> So we really want to not allow folio_isolate_lru() while we are still processing >>>>> the folio. >>>> >>>> Or we should defer adding split folios to LRU after unfreeze. >>>> >>>>> >>>>> What your patch does is, simply not add folios that we will drop from the page >>>>> cache to the LRU? >>>>> >>>>> >>>>> You should describe here how you are fixing it: "Let's fix it by..." >>> Yes. This commit would like to suggest to fix it by having the folio >>> skip the lru_add_split_folio >> >> Skipping it causes more issues like LRU counter mismatch, firing up bad_page() >> since PG_active, PG_unevictable, or MGLRU fields in ->flags.f could stay >> uncleared at page free time. > > OK, we should solve this issue. >> >>>>> >>>>>> >>>>>> [1] >>>>>> [ 22.486082] list_del corruption. next->prev should be fffffffec10e0ac8, but was dead000000000122. (next=fffffffec10e0a88) >>>>>> [ 22.486130] ------------[ cut here ]------------ >>>>>> [ 22.486134] kernel BUG at lib/list_debug.c:67! >>>>>> [ 22.486141] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP >>>>>> [ 22.488502] Tainted: [W]=WARN, [O]=OOT_MODULE >>>>>> [ 22.488506] Hardware name: Spreadtrum UMS9230 1H10 SoC (DT) >>>>>> [ 22.488511] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) >>>>>> [ 22.488517] pc : __list_del_entry_valid_or_report+0x14c/0x154 >>>>>> [ 22.488531] lr : __list_del_entry_valid_or_report+0x14c/0x154 >>>>>> [ 22.488539] sp : ffffffc08006b830 >>>>>> [ 22.488542] x29: ffffffc08006b868 x28: 0000000000003020 x27: 0000000000000000 >>>>>> [ 22.488553] x26: 0000000000000000 x25: 0000000000000004 x24: fffffffec10e0ac0 >>>>>> [ 22.488564] x23: 00000000000000e8 x22: 0000000000000024 x21: dead000000000122 >>>>>> [ 22.488574] x20: fffffffec10e0a88 x19: fffffffec10e0ac8 x18: ffffffc080061060 >>>>>> [ 22.488585] x17: 20747562202c3863 x16: 6130653031636566 x15: 0000000000000058 >>>>>> [ 22.488595] x14: 0000000000000004 x13: ffffff80f91e0000 x12: 0000000000000003 >>>>>> [ 22.488605] x11: 0000000000000003 x10: 0000000000000001 x9 : ffe85721f0e25f00 >>>>>> [ 22.488615] x8 : ffe85721f0e25f00 x7 : 0000000000000000 x6 : 6c65645f7473696c >>>>>> [ 22.488625] x5 : ffffffed39b23026 x4 : 0000000000000000 x3 : 0000000000000010 >>>>>> [ 22.488636] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 000000000000006d >>>>>> [ 22.488647] Call trace: >>>>>> [ 22.488651] __list_del_entry_valid_or_report+0x14c/0x154 (P) >>>>>> [ 22.488661] __folio_put+0x2bc/0x434 >>>>>> [ 22.488670] folio_put+0x28/0x58 >>>>>> [ 22.488678] do_garbage_collect+0x1a34/0x2584 >>>>>> [ 22.488689] f2fs_gc+0x230/0x9b4 >>>>>> [ 22.488697] f2fs_fallocate+0xb90/0xdf4 >>>>>> [ 22.488706] vfs_fallocate+0x1b4/0x2bc >>>>>> [ 22.488716] __arm64_sys_fallocate+0x44/0x78 >>>>>> [ 22.488725] invoke_syscall+0x58/0xe4 >>>>>> [ 22.488732] do_el0_svc+0x48/0xdc >>>>>> [ 22.488739] el0_svc+0x3c/0x98 >>>>>> [ 22.488747] el0t_64_sync_handler+0x20/0x130 >>>>>> [ 22.488754] el0t_64_sync+0x1c4/0x1c8 >>>>>> >>>>>> [2] >>>>>> CPU0 (f2fs GC) CPU1 (split_folio_to_order) CPU2 (folio_isolate_lru) >>>>>> >>>>>> F: pagecache refs = n >>>>>> F: extra refs = GC + split >>>>>> F: PG_lru set >>>>>> move_data_block() >>>>>> folio = f2fs_grab_cache_folio(F) >>>>>> ... >>>>>> __folio_set_dropbehind(F) >>>>>> folio_unlock(F) >>>>>> folio_end_dropbehind(F) >>>>>> folio_unmap_invalidate(F) >>>>>> __filemap_remove_folio(F) >>>>>> folio_put_refs(F, n) >>>>>> folio_put(F) >>>>>> split_folio_to_order(F) >>>>>> folio_ref_freeze(F, 1) >>>>>> ... >>>>>> lru_add_split_folio(T) >>>>>> list_add_tail(&T->lru, &F->lru) >>>>>> folio_set_lru(T) >>>>>> __filemap_remove_folio(T) >>>>>> folio_put_refs(T, 1) >>>>>> /* T refcount == 1, PageLRU set */ >>>>>> free_folio_and_swap_cache(T) >>>>>> folio_put(T) >>>>>> /* refcount: 1 -> 0 */ >>>>>> folio_isolate_lru(T) >>>> >>>> If refcount is 0 at this point, VM_BUG_ON_FOLIO(!folio_ref_count(folio), folio) in >>>> folio_isolate_lru() would be triggered. Maybe we could just return false in that case. >>> No, isolate caller will grab one refcount. >> >> As I said in another email, isolate caller cannot grab a refcount when folio refcount >> is 0. > > pin_user_pages*(..., FOLL_LONGTERM) > └─ __gup_longterm_locked() [gup.c:2465] > │ ├─ follow_page_pte() [gup.c:802] > │ │ └─ try_grab_folio() [gup.c:858] > if (WARN_ON_ONCE(folio_ref_count(folio) <= 0)) > return -ENOMEM; > > // Could __folio_split->folio_put could > race here ? > if (flags & FOLL_GET) > folio_ref_add(folio, refs); > └─ check_and_migrate_movable_pages() [gup.c:2490] > └─ collect_longterm_unpinnable_folios() [gup.c:2391] > └─ └─if (!folio_isolate_lru(folio)) > > Could the __folio_split race in the above scenario? It looks like > try_grab_folio set the refcount without using atomic operation. folio_ref_add() used by try_grab_folio() is an atomic op. Which refcount change is not atomic here? In addition, who is GUPing f2fs folio? I think you need to find the actual f2fs code path instead of chasing theoretical code combinations. > >> (from previous mail) >> Wait, if folio->mapping is NULL and folio is not anonymous, >> folio_check_splittable() returns false at the beginning of >> __folio_split(). So the split cannot happen. > > According to my understanding, the folio checked here is still big > folio which is locked and with folio->mapping set, right? But the provided trace says the folio is split after folio_end_dropbehind(F) and folio->mapping is NULL. >> >>>> >>>>>> folio_test_clear_lru(T) >>>>>> __folio_put(T) >>>>>> __page_cache_release(T) >>>>>> folio_test_lru(T) == false >>>>>> /* skip lruvec_del_folio(T) */ >>>>>> free_frozen_pages(T) >>>>>> folio_get(T) >>>>>> lruvec_del_folio(T) >>>> >>>> But in CPU2 (folio_isolate_lru), lruvec_del_folio(T) should remove T from LRU list. >>>> >>>>>> later: >>>>>> list_del(adjacent->lru) >>>>>> next == &T->lru >>>>>> next->prev == LIST_POISON / PCP freelist >>>>>> BUG >>>>>> >>>> >>>> Why does CPU0 still see the stale link from adjacent? >>> The staled link should be from LRU since the folio never be deleted from lru. >>>> >>>>>> Assisted-by: Cursor:claude-opus-4-8 >>>>>> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> >>>>> >>>>> I'm wondering if this has been broken the whole time, or if some rework allowed >>>>> this to trigger. >>> This issue is from AOSP with v6.18 which just supports big folio in >>> f2fs. Besides, it is triggered by the timing of f2fs's partition get >>> almost full during the test case of filling f2fs's partition(should be >>> the trigger factor of f2fs's gc which enroll truncate thing) >> >> Are you able to reproduce it with other FSes supporting large folio? > > Sorry, I can't so far since only f2fs has gc in the Android system. Have you checked f2fs gc code to make sure it is working correctly? BTW, what makes you think the issue is related to folio_split()? Can you elaborate more on your investigation? Thanks. -- Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH] mm/huge_memory: do not add dropped split tail folios to LRU 2026-06-11 3:06 ` Zi Yan @ 2026-06-11 7:45 ` Zhaoyang Huang 0 siblings, 0 replies; 16+ messages in thread From: Zhaoyang Huang @ 2026-06-11 7:45 UTC (permalink / raw) To: Zi Yan, jaegeuk, Chao Yu, jyescas@google.com Cc: David Hildenbrand (Arm), zhaoyang.huang, Andrew Morton, Lorenzo Stoakes, Barry Song, Baolin Wang, Lance Yang, Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain, linux-mm, linux-kernel, steve.kang, xiuhong.wang, hao_hao.wang +f2fs and android folks @jaegeuk ,chao and jyescas, this mailing thread is talking about an issue which related to f2fs, that is, with the commit 9609dd704725 ("f2fs: remove non-uptodate folio from the page cache in move_data_block") on and off the android's v6.18, we can reproduce or not the kernel panic reported by this RFC. Could you please have insight into this or just revert the suspicious commit? On Thu, Jun 11, 2026 at 11:06 AM Zi Yan <ziy@nvidia.com> wrote: > > On 10 Jun 2026, at 22:39, Zhaoyang Huang wrote: > > > On Thu, Jun 11, 2026 at 9:56 AM Zi Yan <ziy@nvidia.com> wrote: > >> > >> On 10 Jun 2026, at 21:39, Zhaoyang Huang wrote: > >> > >>> On Wed, Jun 10, 2026 at 10:38 PM Zi Yan <ziy@nvidia.com> wrote: > >>>> > >>>> On 10 Jun 2026, at 8:50, David Hildenbrand (Arm) wrote: > >>>> > >>>>> On 6/10/26 14:05, zhaoyang.huang wrote: > >>>>>> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com> > >>>>>> > >>>>>> The kernel panics are keeping to be reported especially when the f2fs > >>>>>> partition get almost full. By investigation, we find that the reason is > >>>>>> one f2fs page got freed to buddy without being deleted from LRU and the > >>>>>> root cause is the race happened in [2] which is enrolled by this commit. > >>>>>> We solve this issue by reverting a f2fs commit 9609dd704725 ("f2fs: remove > >>>>>> non-uptodate folio from the page cache in move_data_block"). > >>>>> > >>>>> But I assume, that other FSes can trigger this as well? Any insights? > >>> > >>> Yes, I think all FSes support big folio could suffer from this defect. > >>> > >>>>> > >>>>>> > >>>>>> There are 3 race processes in this scenario, please find below for their > >>>>>> main activities. However, by further investigation over the code, I > >>>>>> think there is a common race window for the truncated folios between > >>>>>> split_folio_to_order and folio_isolate_lru, where the folios lost the > >>>>>> refcount on page cache and remains the transient one of the split > >>>>>> caller, under which the folio could enter free path and compete with the > >>>>>> isolation process. This commit would like to suggest to have the folios > >>>>>> beyond EOF stay out of LRU. > >>>>>> > >>>>>> Truncate: > >>>>>> The changed code in move_data_block() lets the GC path evict the tail-end > >>>>>> folio from the page cache through folio_end_dropbehind(). Once > >>>>>> folio_unmap_invalidate() removes the folio from mapping->i_pages, the > >>>>>> page-cache references for all pages in the folio are dropped. The folio > >>>>>> is then kept alive only by temporary external references, which allows a > >>>>>> later split to operate on a folio whose subpages are no longer protected > >>>>>> by page-cache references. > >>>>>> > >>>>>> Split: > >>>>>> After the page-cache references are gone, split_folio_to_order() can > >>>>>> split the big folio into individual pages and put the resulting subpages > >>>>>> back on the LRU. For tail pages beyond EOF, split removes them from the > >>>>>> page cache and drops their page-cache references. A tail page can then > >>>>>> remain on the LRU with PG_lru set while holding only the split caller's > >>>>>> temporary reference. When free_folio_and_swap_cache() drops that final > >>>>>> reference, the page enters the final folio_put() release path. > >>>>>> > >>>>>> Isolate: > >>>>>> In parallel, folio_isolate_lru() can observe the same tail page with a > >>>>>> non-zero refcount and PG_lru set. It clears PG_lru before taking its own > >>>>>> reference. If this races with the final folio_put() from the split path, > >>>>>> __folio_put() sees PG_lru already cleared and skips lruvec_del_folio(). > >>>>>> The page is then freed back to the allocator while its lru links are > >>>>>> still present in the LRU list. A later LRU operation on a neighboring > >>>>>> page detects the stale link and reports list corruption. > >>>>> > >>>>> Complicated mess :( > >>>>> > >>>>> So, folio_isolate_lru() really only requires the caller to hold a folio > >>>>> reference, which can happen given that we did the folio_ref_unfreeze(). It can, > >>>>> for example, be triggered by memory offlining or page migration. > >>>>> > >>>>> So we really want to not allow folio_isolate_lru() while we are still processing > >>>>> the folio. > >>>> > >>>> Or we should defer adding split folios to LRU after unfreeze. > >>>> > >>>>> > >>>>> What your patch does is, simply not add folios that we will drop from the page > >>>>> cache to the LRU? > >>>>> > >>>>> > >>>>> You should describe here how you are fixing it: "Let's fix it by..." > >>> Yes. This commit would like to suggest to fix it by having the folio > >>> skip the lru_add_split_folio > >> > >> Skipping it causes more issues like LRU counter mismatch, firing up bad_page() > >> since PG_active, PG_unevictable, or MGLRU fields in ->flags.f could stay > >> uncleared at page free time. > > > > OK, we should solve this issue. > >> > >>>>> > >>>>>> > >>>>>> [1] > >>>>>> [ 22.486082] list_del corruption. next->prev should be fffffffec10e0ac8, but was dead000000000122. (next=fffffffec10e0a88) > >>>>>> [ 22.486130] ------------[ cut here ]------------ > >>>>>> [ 22.486134] kernel BUG at lib/list_debug.c:67! > >>>>>> [ 22.486141] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP > >>>>>> [ 22.488502] Tainted: [W]=WARN, [O]=OOT_MODULE > >>>>>> [ 22.488506] Hardware name: Spreadtrum UMS9230 1H10 SoC (DT) > >>>>>> [ 22.488511] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > >>>>>> [ 22.488517] pc : __list_del_entry_valid_or_report+0x14c/0x154 > >>>>>> [ 22.488531] lr : __list_del_entry_valid_or_report+0x14c/0x154 > >>>>>> [ 22.488539] sp : ffffffc08006b830 > >>>>>> [ 22.488542] x29: ffffffc08006b868 x28: 0000000000003020 x27: 0000000000000000 > >>>>>> [ 22.488553] x26: 0000000000000000 x25: 0000000000000004 x24: fffffffec10e0ac0 > >>>>>> [ 22.488564] x23: 00000000000000e8 x22: 0000000000000024 x21: dead000000000122 > >>>>>> [ 22.488574] x20: fffffffec10e0a88 x19: fffffffec10e0ac8 x18: ffffffc080061060 > >>>>>> [ 22.488585] x17: 20747562202c3863 x16: 6130653031636566 x15: 0000000000000058 > >>>>>> [ 22.488595] x14: 0000000000000004 x13: ffffff80f91e0000 x12: 0000000000000003 > >>>>>> [ 22.488605] x11: 0000000000000003 x10: 0000000000000001 x9 : ffe85721f0e25f00 > >>>>>> [ 22.488615] x8 : ffe85721f0e25f00 x7 : 0000000000000000 x6 : 6c65645f7473696c > >>>>>> [ 22.488625] x5 : ffffffed39b23026 x4 : 0000000000000000 x3 : 0000000000000010 > >>>>>> [ 22.488636] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 000000000000006d > >>>>>> [ 22.488647] Call trace: > >>>>>> [ 22.488651] __list_del_entry_valid_or_report+0x14c/0x154 (P) > >>>>>> [ 22.488661] __folio_put+0x2bc/0x434 > >>>>>> [ 22.488670] folio_put+0x28/0x58 > >>>>>> [ 22.488678] do_garbage_collect+0x1a34/0x2584 > >>>>>> [ 22.488689] f2fs_gc+0x230/0x9b4 > >>>>>> [ 22.488697] f2fs_fallocate+0xb90/0xdf4 > >>>>>> [ 22.488706] vfs_fallocate+0x1b4/0x2bc > >>>>>> [ 22.488716] __arm64_sys_fallocate+0x44/0x78 > >>>>>> [ 22.488725] invoke_syscall+0x58/0xe4 > >>>>>> [ 22.488732] do_el0_svc+0x48/0xdc > >>>>>> [ 22.488739] el0_svc+0x3c/0x98 > >>>>>> [ 22.488747] el0t_64_sync_handler+0x20/0x130 > >>>>>> [ 22.488754] el0t_64_sync+0x1c4/0x1c8 > >>>>>> > >>>>>> [2] > >>>>>> CPU0 (f2fs GC) CPU1 (split_folio_to_order) CPU2 (folio_isolate_lru) > >>>>>> > >>>>>> F: pagecache refs = n > >>>>>> F: extra refs = GC + split > >>>>>> F: PG_lru set > >>>>>> move_data_block() > >>>>>> folio = f2fs_grab_cache_folio(F) > >>>>>> ... > >>>>>> __folio_set_dropbehind(F) > >>>>>> folio_unlock(F) > >>>>>> folio_end_dropbehind(F) > >>>>>> folio_unmap_invalidate(F) > >>>>>> __filemap_remove_folio(F) > >>>>>> folio_put_refs(F, n) > >>>>>> folio_put(F) > >>>>>> split_folio_to_order(F) > >>>>>> folio_ref_freeze(F, 1) > >>>>>> ... > >>>>>> lru_add_split_folio(T) > >>>>>> list_add_tail(&T->lru, &F->lru) > >>>>>> folio_set_lru(T) > >>>>>> __filemap_remove_folio(T) > >>>>>> folio_put_refs(T, 1) > >>>>>> /* T refcount == 1, PageLRU set */ > >>>>>> free_folio_and_swap_cache(T) > >>>>>> folio_put(T) > >>>>>> /* refcount: 1 -> 0 */ > >>>>>> folio_isolate_lru(T) > >>>> > >>>> If refcount is 0 at this point, VM_BUG_ON_FOLIO(!folio_ref_count(folio), folio) in > >>>> folio_isolate_lru() would be triggered. Maybe we could just return false in that case. > >>> No, isolate caller will grab one refcount. > >> > >> As I said in another email, isolate caller cannot grab a refcount when folio refcount > >> is 0. > > > > pin_user_pages*(..., FOLL_LONGTERM) > > └─ __gup_longterm_locked() [gup.c:2465] > > │ ├─ follow_page_pte() [gup.c:802] > > │ │ └─ try_grab_folio() [gup.c:858] > > if (WARN_ON_ONCE(folio_ref_count(folio) <= 0)) > > return -ENOMEM; > > > > // Could __folio_split->folio_put could > > race here ? > > if (flags & FOLL_GET) > > folio_ref_add(folio, refs); > > └─ check_and_migrate_movable_pages() [gup.c:2490] > > └─ collect_longterm_unpinnable_folios() [gup.c:2391] > > └─ └─if (!folio_isolate_lru(folio)) > > > > Could the __folio_split race in the above scenario? It looks like > > try_grab_folio set the refcount without using atomic operation. > > folio_ref_add() used by try_grab_folio() is an atomic op. > Which refcount change is not atomic here? The atomic I mean is folio_try_get is implemented by atomic_add_unless, while try_grab_folio does this by the below sequence which leaves a window to have __folio_split race with it. right? if (WARN_ON_ONCE(folio_ref_count(folio) <= 0)) .... if (flags & FOLL_GET) folio_ref_add(folio, refs); > > In addition, who is GUPing f2fs folio? Don't know yet. > > I think you need to find the actual f2fs code path instead of > chasing theoretical code combinations. The test case get passed by reverting the commit of folio_end_dropbehind which encourage us to believe this is the clue. > > > > >> (from previous mail) > >> Wait, if folio->mapping is NULL and folio is not anonymous, > >> folio_check_splittable() returns false at the beginning of > >> __folio_split(). So the split cannot happen. > > > > According to my understanding, the folio checked here is still big > > folio which is locked and with folio->mapping set, right? > > But the provided trace says the folio is split after folio_end_dropbehind(F) > and folio->mapping is NULL. Please find below for more information of the coredump. We can know the BUG_ON information that the folio just under list_del is fffffffec096e440 while its lru.next folio fffffffec096e480 is the one which get freed to PCP without lruvec_del_folio wrongly[1]. We can also find that that 'folio(0xfffffffec096e440)->lru.prev = fffffffec0f639c0' in which fffffffec0f639c0 is an alone index folio within the page cache that looks like the result of the fallocate[3]. So if it is possible that the split happens prior to fallocate and then the folio got truncate and free_folio_and_swap_cache race with folio_isolate_lru? [1] [ 22.339229] list_del corruption. next->prev should be fffffffec096e448, but was ffffff80f9791830. (next=fffffffec096e488) struct page 0xfffffffec096e440 { lru = { next = 0xfffffffec096e488, prev = 0xfffffffec096e408 [2] fffffffec096e440 a5b91000 0 18 0 24 referenced,lru fffffffec096e480 a5b92000 ffffff801e930481 73009e9 1 41028 uptodate,lru,owner_2,swapbacked fffffffec096e4c0 a5b93000 ffffff801e930481 730033a 1 41028 uptodate,lru,owner_2,swapbacked [3] fffffffec33f9440 index: 76446 position: root/0/18/42/30 fffffffec00da9c0 index: 76448 position: root/0/18/42/32 fffffffec3ded040 index: 76449 position: root/0/18/42/33 fffffffec0f639c0 index: 6188581 position: root/23/38/56/37 fffffffec0f63a00 index: 6188853 position: root/23/38/60/53 fffffffec0f63a40 index: 6188854 position: root/23/38/60/54 [4] CPU0 (f2fs GC) CPU1 (split_folio_to_order) CPU2 (folio_isolate_lru) split_folio_to_order(F) folio_ref_freeze(F, 1) ... lru_add_split_folio(T) list_add_tail(&T->lru, &F->lru) folio_set_lru(T) __filemap_remove_folio(T) folio_put_refs(T, 1) folio_unlock(new_folio); move_data_block() folio = f2fs_grab_cache_folio(F) ... __folio_set_dropbehind(F) folio_unlock(F) folio_end_dropbehind(F) folio_unmap_invalidate(F) __filemap_remove_folio(F) folio_put_refs(F, n) folio_put(F) /* T refcount == 1, PageLRU set */ free_folio_and_swap_cache(T) folio_put(T) /* refcount: 1 -> 0 */ folio_isolate_lru(T) folio_test_clear_lru(T) __folio_put(T) __page_cache_release(T) folio_test_lru(T) == false /* skip lruvec_del_folio(T) */ free_frozen_pages(T) folio_get(T) lruvec_del_folio(T) > > >> > >>>> > >>>>>> folio_test_clear_lru(T) > >>>>>> __folio_put(T) > >>>>>> __page_cache_release(T) > >>>>>> folio_test_lru(T) == false > >>>>>> /* skip lruvec_del_folio(T) */ > >>>>>> free_frozen_pages(T) > >>>>>> folio_get(T) > >>>>>> lruvec_del_folio(T) > >>>> > >>>> But in CPU2 (folio_isolate_lru), lruvec_del_folio(T) should remove T from LRU list. > >>>> > >>>>>> later: > >>>>>> list_del(adjacent->lru) > >>>>>> next == &T->lru > >>>>>> next->prev == LIST_POISON / PCP freelist > >>>>>> BUG > >>>>>> > >>>> > >>>> Why does CPU0 still see the stale link from adjacent? > >>> The staled link should be from LRU since the folio never be deleted from lru. > >>>> > >>>>>> Assisted-by: Cursor:claude-opus-4-8 > >>>>>> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> > >>>>> > >>>>> I'm wondering if this has been broken the whole time, or if some rework allowed > >>>>> this to trigger. > >>> This issue is from AOSP with v6.18 which just supports big folio in > >>> f2fs. Besides, it is triggered by the timing of f2fs's partition get > >>> almost full during the test case of filling f2fs's partition(should be > >>> the trigger factor of f2fs's gc which enroll truncate thing) > >> > >> Are you able to reproduce it with other FSes supporting large folio? > > > > Sorry, I can't so far since only f2fs has gc in the Android system. > > Have you checked f2fs gc code to make sure it is working correctly? > BTW, what makes you think the issue is related to folio_split()? > Can you elaborate more on your investigation? > > Thanks. > > > -- > Best Regards, > Yan, Zi ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH] mm/huge_memory: do not add dropped split tail folios to LRU 2026-06-10 12:05 [RFC PATCH] mm/huge_memory: do not add dropped split tail folios to LRU zhaoyang.huang 2026-06-10 12:50 ` David Hildenbrand (Arm) @ 2026-06-10 20:30 ` Andrew Morton 2026-06-10 20:36 ` Zi Yan 2026-06-11 7:33 ` [syzbot ci] " syzbot ci 2026-06-11 9:30 ` [RFC PATCH] " Lorenzo Stoakes 3 siblings, 1 reply; 16+ messages in thread From: Andrew Morton @ 2026-06-10 20:30 UTC (permalink / raw) To: zhaoyang.huang Cc: David Hildenbrand, Zi Yan, Lorenzo Stoakes, Barry Song, Baolin Wang, Lance Yang, Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain, linux-mm, linux-kernel, Zhaoyang Huang, steve.kang On Wed, 10 Jun 2026 20:05:35 +0800 "zhaoyang.huang" <zhaoyang.huang@unisoc.com> wrote: > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com> > > The kernel panics are keeping to be reported especially when the f2fs > partition get almost full. By investigation, we find that the reason is > one f2fs page got freed to buddy without being deleted from LRU and the > root cause is the race happened in [2] which is enrolled by this commit. > We solve this issue by reverting a f2fs commit 9609dd704725 ("f2fs: remove > non-uptodate folio from the page cache in move_data_block"). > > There are 3 race processes in this scenario, please find below for their > main activities. However, by further investigation over the code, I > think there is a common race window for the truncated folios between > split_folio_to_order and folio_isolate_lru, where the folios lost the > refcount on page cache and remains the transient one of the split > caller, under which the folio could enter free path and compete with the > isolation process. This commit would like to suggest to have the folios > beyond EOF stay out of LRU. > > Truncate: > The changed code in move_data_block() lets the GC path evict the tail-end > folio from the page cache through folio_end_dropbehind(). Once > folio_unmap_invalidate() removes the folio from mapping->i_pages, the > page-cache references for all pages in the folio are dropped. The folio > is then kept alive only by temporary external references, which allows a > later split to operate on a folio whose subpages are no longer protected > by page-cache references. > > Split: > After the page-cache references are gone, split_folio_to_order() can > split the big folio into individual pages and put the resulting subpages > back on the LRU. For tail pages beyond EOF, split removes them from the > page cache and drops their page-cache references. A tail page can then > remain on the LRU with PG_lru set while holding only the split caller's > temporary reference. When free_folio_and_swap_cache() drops that final > reference, the page enters the final folio_put() release path. > > Isolate: > In parallel, folio_isolate_lru() can observe the same tail page with a > non-zero refcount and PG_lru set. It clears PG_lru before taking its own > reference. If this races with the final folio_put() from the split path, > __folio_put() sees PG_lru already cleared and skips lruvec_del_folio(). > The page is then freed back to the allocator while its lru links are > still present in the LRU list. A later LRU operation on a neighboring > page detects the stale link and reports list corruption. Thanks. Sashiko AI review might have found some problems with folio flags: https://sashiko.dev/#/patchset/20260610120535.2370844-1-zhaoyang.huang@unisoc.com ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH] mm/huge_memory: do not add dropped split tail folios to LRU 2026-06-10 20:30 ` Andrew Morton @ 2026-06-10 20:36 ` Zi Yan 0 siblings, 0 replies; 16+ messages in thread From: Zi Yan @ 2026-06-10 20:36 UTC (permalink / raw) To: Andrew Morton Cc: zhaoyang.huang, David Hildenbrand, Lorenzo Stoakes, Barry Song, Baolin Wang, Lance Yang, Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain, linux-mm, linux-kernel, Zhaoyang Huang, steve.kang On 10 Jun 2026, at 16:30, Andrew Morton wrote: > On Wed, 10 Jun 2026 20:05:35 +0800 "zhaoyang.huang" <zhaoyang.huang@unisoc.com> wrote: > >> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com> >> >> The kernel panics are keeping to be reported especially when the f2fs >> partition get almost full. By investigation, we find that the reason is >> one f2fs page got freed to buddy without being deleted from LRU and the >> root cause is the race happened in [2] which is enrolled by this commit. >> We solve this issue by reverting a f2fs commit 9609dd704725 ("f2fs: remove >> non-uptodate folio from the page cache in move_data_block"). >> >> There are 3 race processes in this scenario, please find below for their >> main activities. However, by further investigation over the code, I >> think there is a common race window for the truncated folios between >> split_folio_to_order and folio_isolate_lru, where the folios lost the >> refcount on page cache and remains the transient one of the split >> caller, under which the folio could enter free path and compete with the >> isolation process. This commit would like to suggest to have the folios >> beyond EOF stay out of LRU. >> >> Truncate: >> The changed code in move_data_block() lets the GC path evict the tail-end >> folio from the page cache through folio_end_dropbehind(). Once >> folio_unmap_invalidate() removes the folio from mapping->i_pages, the >> page-cache references for all pages in the folio are dropped. The folio >> is then kept alive only by temporary external references, which allows a >> later split to operate on a folio whose subpages are no longer protected >> by page-cache references. >> >> Split: >> After the page-cache references are gone, split_folio_to_order() can >> split the big folio into individual pages and put the resulting subpages >> back on the LRU. For tail pages beyond EOF, split removes them from the >> page cache and drops their page-cache references. A tail page can then >> remain on the LRU with PG_lru set while holding only the split caller's >> temporary reference. When free_folio_and_swap_cache() drops that final >> reference, the page enters the final folio_put() release path. >> >> Isolate: >> In parallel, folio_isolate_lru() can observe the same tail page with a >> non-zero refcount and PG_lru set. It clears PG_lru before taking its own >> reference. If this races with the final folio_put() from the split path, >> __folio_put() sees PG_lru already cleared and skips lruvec_del_folio(). >> The page is then freed back to the allocator while its lru links are >> still present in the LRU list. A later LRU operation on a neighboring >> page detects the stale link and reports list corruption. > > Thanks. Sashiko AI review might have found some problems with folio > flags: > > https://sashiko.dev/#/patchset/20260610120535.2370844-1-zhaoyang.huang@unisoc.com Claude also raised the same concern when I was reasoning about this issue. At least for now, my conclusion is that the race between folio_split() and folio_isolate_lru() should not cause the issue and something else is wrong. Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 16+ messages in thread
* [syzbot ci] Re: mm/huge_memory: do not add dropped split tail folios to LRU 2026-06-10 12:05 [RFC PATCH] mm/huge_memory: do not add dropped split tail folios to LRU zhaoyang.huang 2026-06-10 12:50 ` David Hildenbrand (Arm) 2026-06-10 20:30 ` Andrew Morton @ 2026-06-11 7:33 ` syzbot ci 2026-06-11 9:30 ` [RFC PATCH] " Lorenzo Stoakes 3 siblings, 0 replies; 16+ messages in thread From: syzbot ci @ 2026-06-11 7:33 UTC (permalink / raw) To: akpm, baohua, baolin.wang, david, dev.jain, huangzhaoyang, lance.yang, liam.howlett, linux-kernel, linux-mm, lorenzo.stoakes, npache, ryan.roberts, steve.kang, zhaoyang.huang, ziy Cc: syzbot, syzkaller-bugs syzbot ci has tested the following series [v1] mm/huge_memory: do not add dropped split tail folios to LRU https://lore.kernel.org/all/20260610120535.2370844-1-zhaoyang.huang@unisoc.com * [RFC PATCH] mm/huge_memory: do not add dropped split tail folios to LRU and found the following issues: * BUG: Bad page state in ext4_write_begin * BUG: Bad page state in iomap_write_begin * BUG: Bad page state in shmem_get_folio_gfp Full report is available here: https://ci.syzbot.org/series/c3e122ba-1000-4581-ba3f-237f41482af8 *** BUG: Bad page state in ext4_write_begin tree: mm-new URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/akpm/mm.git base: 1ec3cca2d8b6b9ff6584ca626d4c8918bbf48d44 arch: amd64 compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8 config: https://ci.syzbot.org/builds/ffde37a3-aed0-4f49-bba1-ca31cd6a4b04/config syz repro: https://ci.syzbot.org/findings/07322c5f-4419-4281-bbd5-1b06eebe91f2/syz_repro ext2 filesystem being mounted at /0/file1 supports timestamps until 2038-01-19 (0x7fffffff) BUG: Bad page state in process syz.0.17 pfn:11e231 page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x11e231 flags: 0x17ff20000000000(node=0|zone=2|lastcpupid=0x7ff) raw: 017ff20000000000 0000000000000000 00000000ffffffff 0000000000000000 raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set page_owner tracks the page as allocated page last allocated via order 0, migratetype Movable, gfp_mask 0x153cca(GFP_HIGHUSER_MOVABLE|__GFP_WRITE|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP), pid 5841, tgid 5840 (syz.0.17), ts 75851604747, free_ts 72751451789 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x22d/0x280 mm/page_alloc.c:1853 prep_new_page mm/page_alloc.c:1861 [inline] get_page_from_freelist+0x2593/0x2610 mm/page_alloc.c:3941 __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5221 alloc_pages_mpol+0x235/0x490 mm/mempolicy.c:2490 alloc_frozen_pages_noprof mm/mempolicy.c:2561 [inline] alloc_pages_noprof+0xac/0x2a0 mm/mempolicy.c:2581 folio_alloc_noprof+0x1e/0x30 mm/mempolicy.c:2591 filemap_alloc_folio_noprof+0x111/0x470 mm/filemap.c:1014 __filemap_get_folio_mpol+0x3fc/0xb00 mm/filemap.c:2012 __filemap_get_folio include/linux/pagemap.h:763 [inline] write_begin_get_folio include/linux/pagemap.h:789 [inline] ext4_write_begin+0x4ad/0x1890 fs/ext4/inode.c:1331 generic_perform_write+0x2e2/0x8f0 mm/filemap.c:4325 ext4_buffered_write_iter+0xce/0x3a0 fs/ext4/file.c:316 ext4_file_write_iter+0x298/0x1bf0 fs/ext4/file.c:-1 new_sync_write fs/read_write.c:595 [inline] vfs_write+0x61d/0xb90 fs/read_write.c:688 ksys_pwrite64 fs/read_write.c:795 [inline] __do_sys_pwrite64 fs/read_write.c:803 [inline] __se_sys_pwrite64 fs/read_write.c:800 [inline] __x64_sys_pwrite64+0x199/0x230 fs/read_write.c:800 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f page last free pid 5718 tgid 5718 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] __free_pages_prepare mm/page_alloc.c:1397 [inline] free_unref_folios+0xd9f/0x14c0 mm/page_alloc.c:2999 folios_put_refs+0x9ff/0xb40 mm/swap.c:1008 free_pages_and_swap_cache+0x41d/0x490 mm/swap_state.c:404 __tlb_batch_free_encoded_pages mm/mmu_gather.c:138 [inline] tlb_batch_pages_flush mm/mmu_gather.c:151 [inline] tlb_flush_mmu_free mm/mmu_gather.c:417 [inline] tlb_flush_mmu+0x6d3/0xa30 mm/mmu_gather.c:424 tlb_finish_mmu+0xf9/0x230 mm/mmu_gather.c:549 exit_mmap+0x498/0x9e0 mm/mmap.c:1313 __mmput+0x118/0x430 kernel/fork.c:1178 exit_mm+0x1f6/0x2d0 kernel/exit.c:582 do_exit+0x6a2/0x22c0 kernel/exit.c:964 do_group_exit+0x21b/0x2d0 kernel/exit.c:1119 get_signal+0x1284/0x1330 kernel/signal.c:3037 arch_do_signal_or_restart+0xbc/0x840 arch/x86/kernel/signal.c:337 __exit_to_user_mode_loop kernel/entry/common.c:64 [inline] exit_to_user_mode_loop+0xa9/0x680 kernel/entry/common.c:98 __exit_to_user_mode_prepare include/linux/irq-entry-common.h:207 [inline] syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:230 [inline] syscall_exit_to_user_mode include/linux/entry-common.h:318 [inline] do_syscall_64+0x353/0x580 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x77/0x7f Modules linked in: CPU: 0 UID: 0 PID: 5841 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120 bad_page+0x17f/0x1c0 mm/page_alloc.c:632 free_page_is_bad mm/page_alloc.c:1076 [inline] __free_pages_prepare mm/page_alloc.c:1388 [inline] __free_frozen_pages+0xcd9/0xd30 mm/page_alloc.c:2938 __folio_put+0x4a2/0x580 mm/swap.c:112 __folio_split+0xffe/0x1570 mm/huge_memory.c:4199 try_folio_split_to_order include/linux/huge_mm.h:411 [inline] try_folio_split_or_unmap+0x5b/0x1e0 mm/truncate.c:189 truncate_inode_partial_folio+0x4ab/0x8e0 mm/truncate.c:255 truncate_inode_pages_range+0x5f1/0xe30 mm/truncate.c:416 ext4_truncate_failed_write fs/ext4/truncate.h:21 [inline] ext4_write_end+0x784/0xa30 fs/ext4/inode.c:1495 generic_perform_write+0x620/0x8f0 mm/filemap.c:4346 ext4_buffered_write_iter+0xce/0x3a0 fs/ext4/file.c:316 ext4_file_write_iter+0x298/0x1bf0 fs/ext4/file.c:-1 new_sync_write fs/read_write.c:595 [inline] vfs_write+0x61d/0xb90 fs/read_write.c:688 ksys_pwrite64 fs/read_write.c:795 [inline] __do_sys_pwrite64 fs/read_write.c:803 [inline] __se_sys_pwrite64 fs/read_write.c:800 [inline] __x64_sys_pwrite64+0x199/0x230 fs/read_write.c:800 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fb06359ce59 Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fb0643ff028 EFLAGS: 00000246 ORIG_RAX: 0000000000000012 RAX: ffffffffffffffda RBX: 00007fb063815fa0 RCX: 00007fb06359ce59 RDX: 000000000000fdef RSI: 0000200000000140 RDI: 0000000000000004 RBP: 00007fb063632d6f R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000c00 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fb063816038 R14: 00007fb063815fa0 R15: 00007ffe99cbba98 </TASK> BUG: Bad page state in process syz.0.17 pfn:11e232 page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x2 pfn:0x11e232 head: order:0 mapcount:0 entire_mapcount:1 nr_pages_mapped:0 pincount:0 flags: 0x17ff20000000040(head|node=0|zone=2|lastcpupid=0x7ff) raw: 017ff20000000040 0000000000000000 ffffea0004788c90 0000000000000000 raw: 0000000000000002 0000000000000000 00000000ffffffff 0000000000000000 head: 017ff20000000040 0000000000000000 ffffea0004788c90 0000000000000000 head: 0000000000000002 0000000000000000 00000000ffffffff 0000000000000000 head: 017ff00000000000 0000000000000000 00000000ffffffff 0000000000000000 head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set page_owner tracks the page as allocated page last allocated via order 1, migratetype Movable, gfp_mask 0x153cca(GFP_HIGHUSER_MOVABLE|__GFP_WRITE|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP), pid 5841, tgid 5840 (syz.0.17), ts 75851604747, free_ts 72751458324 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x22d/0x280 mm/page_alloc.c:1853 prep_new_page mm/page_alloc.c:1861 [inline] get_page_from_freelist+0x2593/0x2610 mm/page_alloc.c:3941 __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5221 alloc_pages_mpol+0x235/0x490 mm/mempolicy.c:2490 alloc_frozen_pages_noprof mm/mempolicy.c:2561 [inline] alloc_pages_noprof+0xac/0x2a0 mm/mempolicy.c:2581 folio_alloc_noprof+0x1e/0x30 mm/mempolicy.c:2591 filemap_alloc_folio_noprof+0x111/0x470 mm/filemap.c:1014 __filemap_get_folio_mpol+0x3fc/0xb00 mm/filemap.c:2012 __filemap_get_folio include/linux/pagemap.h:763 [inline] write_begin_get_folio include/linux/pagemap.h:789 [inline] ext4_write_begin+0x4ad/0x1890 fs/ext4/inode.c:1331 generic_perform_write+0x2e2/0x8f0 mm/filemap.c:4325 ext4_buffered_write_iter+0xce/0x3a0 fs/ext4/file.c:316 ext4_file_write_iter+0x298/0x1bf0 fs/ext4/file.c:-1 new_sync_write fs/read_write.c:595 [inline] vfs_write+0x61d/0xb90 fs/read_write.c:688 ksys_pwrite64 fs/read_write.c:795 [inline] __do_sys_pwrite64 fs/read_write.c:803 [inline] __se_sys_pwrite64 fs/read_write.c:800 [inline] __x64_sys_pwrite64+0x199/0x230 fs/read_write.c:800 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f page last free pid 5718 tgid 5718 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] __free_pages_prepare mm/page_alloc.c:1397 [inline] free_unref_folios+0xd9f/0x14c0 mm/page_alloc.c:2999 folios_put_refs+0x9ff/0xb40 mm/swap.c:1008 free_pages_and_swap_cache+0x41d/0x490 mm/swap_state.c:404 __tlb_batch_free_encoded_pages mm/mmu_gather.c:138 [inline] tlb_batch_pages_flush mm/mmu_gather.c:151 [inline] tlb_flush_mmu_free mm/mmu_gather.c:417 [inline] tlb_flush_mmu+0x6d3/0xa30 mm/mmu_gather.c:424 tlb_finish_mmu+0xf9/0x230 mm/mmu_gather.c:549 exit_mmap+0x498/0x9e0 mm/mmap.c:1313 __mmput+0x118/0x430 kernel/fork.c:1178 exit_mm+0x1f6/0x2d0 kernel/exit.c:582 do_exit+0x6a2/0x22c0 kernel/exit.c:964 do_group_exit+0x21b/0x2d0 kernel/exit.c:1119 get_signal+0x1284/0x1330 kernel/signal.c:3037 arch_do_signal_or_restart+0xbc/0x840 arch/x86/kernel/signal.c:337 __exit_to_user_mode_loop kernel/entry/common.c:64 [inline] exit_to_user_mode_loop+0xa9/0x680 kernel/entry/common.c:98 __exit_to_user_mode_prepare include/linux/irq-entry-common.h:207 [inline] syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:230 [inline] syscall_exit_to_user_mode include/linux/entry-common.h:318 [inline] do_syscall_64+0x353/0x580 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x77/0x7f Modules linked in: CPU: 1 UID: 0 PID: 5841 Comm: syz.0.17 Tainted: G B syzkaller #0 PREEMPT(full) Tainted: [B]=BAD_PAGE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120 bad_page+0x17f/0x1c0 mm/page_alloc.c:632 free_page_is_bad mm/page_alloc.c:1076 [inline] __free_pages_prepare mm/page_alloc.c:1388 [inline] __free_frozen_pages+0xcd9/0xd30 mm/page_alloc.c:2938 __folio_put+0x4a2/0x580 mm/swap.c:112 __folio_split+0xffe/0x1570 mm/huge_memory.c:4199 try_folio_split_to_order include/linux/huge_mm.h:411 [inline] try_folio_split_or_unmap+0x5b/0x1e0 mm/truncate.c:189 truncate_inode_partial_folio+0x4ab/0x8e0 mm/truncate.c:255 truncate_inode_pages_range+0x5f1/0xe30 mm/truncate.c:416 ext4_truncate_failed_write fs/ext4/truncate.h:21 [inline] ext4_write_end+0x784/0xa30 fs/ext4/inode.c:1495 generic_perform_write+0x620/0x8f0 mm/filemap.c:4346 ext4_buffered_write_iter+0xce/0x3a0 fs/ext4/file.c:316 ext4_file_write_iter+0x298/0x1bf0 fs/ext4/file.c:-1 new_sync_write fs/read_write.c:595 [inline] vfs_write+0x61d/0xb90 fs/read_write.c:688 ksys_pwrite64 fs/read_write.c:795 [inline] __do_sys_pwrite64 fs/read_write.c:803 [inline] __se_sys_pwrite64 fs/read_write.c:800 [inline] __x64_sys_pwrite64+0x199/0x230 fs/read_write.c:800 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fb06359ce59 Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fb0643ff028 EFLAGS: 00000246 ORIG_RAX: 0000000000000012 RAX: ffffffffffffffda RBX: 00007fb063815fa0 RCX: 00007fb06359ce59 RDX: 000000000000fdef RSI: 0000200000000140 RDI: 0000000000000004 RBP: 00007fb063632d6f R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000c00 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fb063816038 R14: 00007fb063815fa0 R15: 00007ffe99cbba98 </TASK> BUG: Bad page state in process syz.0.17 pfn:11e234 page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x4 pfn:0x11e234 head: order:0 mapcount:0 entire_mapcount:1 nr_pages_mapped:0 pincount:0 flags: 0x17ff20000000040(head|node=0|zone=2|lastcpupid=0x7ff) raw: 017ff20000000040 0000000000000000 dead000000000122 0000000000000000 raw: 0000000000000004 0000000000000000 00000000ffffffff 0000000000000000 head: 017ff20000000040 0000000000000000 dead000000000122 0000000000000000 head: 0000000000000004 0000000000000000 00000000ffffffff 0000000000000000 head: 017ff00000000000 0000000000000000 00000000ffffffff 0000000000000000 head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set page_owner tracks the page as allocated page last allocated via order 2, migratetype Movable, gfp_mask 0x153cca(GFP_HIGHUSER_MOVABLE|__GFP_WRITE|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP), pid 5841, tgid 5840 (syz.0.17), ts 75851604747, free_ts 72751484534 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x22d/0x280 mm/page_alloc.c:1853 prep_new_page mm/page_alloc.c:1861 [inline] get_page_from_freelist+0x2593/0x2610 mm/page_alloc.c:3941 __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5221 alloc_pages_mpol+0x235/0x490 mm/mempolicy.c:2490 alloc_frozen_pages_noprof mm/mempolicy.c:2561 [inline] alloc_pages_noprof+0xac/0x2a0 mm/mempolicy.c:2581 folio_alloc_noprof+0x1e/0x30 mm/mempolicy.c:2591 filemap_alloc_folio_noprof+0x111/0x470 mm/filemap.c:1014 __filemap_get_folio_mpol+0x3fc/0xb00 mm/filemap.c:2012 __filemap_get_folio include/linux/pagemap.h:763 [inline] write_begin_get_folio include/linux/pagemap.h:789 [inline] ext4_write_begin+0x4ad/0x1890 fs/ext4/inode.c:1331 generic_perform_write+0x2e2/0x8f0 mm/filemap.c:4325 ext4_buffered_write_iter+0xce/0x3a0 fs/ext4/file.c:316 ext4_file_write_iter+0x298/0x1bf0 fs/ext4/file.c:-1 new_sync_write fs/read_write.c:595 [inline] vfs_write+0x61d/0xb90 fs/read_write.c:688 ksys_pwrite64 fs/read_write.c:795 [inline] __do_sys_pwrite64 fs/read_write.c:803 [inline] __se_sys_pwrite64 fs/read_write.c:800 [inline] __x64_sys_pwrite64+0x199/0x230 fs/read_write.c:800 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f page last free pid 5718 tgid 5718 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] __free_pages_prepare mm/page_alloc.c:1397 [inline] free_unref_folios+0xd9f/0x14c0 mm/page_alloc.c:2999 folios_put_refs+0x9ff/0xb40 mm/swap.c:1008 free_pages_and_swap_cache+0x41d/0x490 mm/swap_state.c:404 __tlb_batch_free_encoded_pages mm/mmu_gather.c:138 [inline] tlb_batch_pages_flush mm/mmu_gather.c:151 [inline] tlb_flush_mmu_free mm/mmu_gather.c:417 [inline] tlb_flush_mmu+0x6d3/0xa30 mm/mmu_gather.c:424 tlb_finish_mmu+0xf9/0x230 mm/mmu_gather.c:549 exit_mmap+0x498/0x9e0 mm/mmap.c:1313 __mmput+0x118/0x430 kernel/fork.c:1178 exit_mm+0x1f6/0x2d0 kernel/exit.c:582 do_exit+0x6a2/0x22c0 kernel/exit.c:964 do_group_exit+0x21b/0x2d0 kernel/exit.c:1119 get_signal+0x1284/0x1330 kernel/signal.c:3037 arch_do_signal_or_restart+0xbc/0x840 arch/x86/kernel/signal.c:337 __exit_to_user_mode_loop kernel/entry/common.c:64 [inline] exit_to_user_mode_loop+0xa9/0x680 kernel/entry/common.c:98 __exit_to_user_mode_prepare include/linux/irq-entry-common.h:207 [inline] syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:230 [inline] syscall_exit_to_user_mode include/linux/entry-common.h:318 [inline] do_syscall_64+0x353/0x580 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x77/0x7f Modules linked in: CPU: 1 UID: 0 PID: 5841 Comm: syz.0.17 Tainted: G B syzkaller #0 PREEMPT(full) Tainted: [B]=BAD_PAGE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120 bad_page+0x17f/0x1c0 mm/page_alloc.c:632 free_page_is_bad mm/page_alloc.c:1076 [inline] __free_pages_prepare mm/page_alloc.c:1388 [inline] __free_frozen_pages+0xcd9/0xd30 mm/page_alloc.c:2938 __folio_put+0x4a2/0x580 mm/swap.c:112 __folio_split+0xffe/0x1570 mm/huge_memory.c:4199 try_folio_split_to_order include/linux/huge_mm.h:411 [inline] try_folio_split_or_unmap+0x5b/0x1e0 mm/truncate.c:189 truncate_inode_partial_folio+0x4ab/0x8e0 mm/truncate.c:255 truncate_inode_pages_range+0x5f1/0xe30 mm/truncate.c:416 ext4_truncate_failed_write fs/ext4/truncate.h:21 [inline] ext4_write_end+0x784/0xa30 fs/ext4/inode.c:1495 generic_perform_write+0x620/0x8f0 mm/filemap.c:4346 ext4_buffered_write_iter+0xce/0x3a0 fs/ext4/file.c:316 ext4_file_write_iter+0x298/0x1bf0 fs/ext4/file.c:-1 new_sync_write fs/read_write.c:595 [inline] vfs_write+0x61d/0xb90 fs/read_write.c:688 ksys_pwrite64 fs/read_write.c:795 [inline] __do_sys_pwrite64 fs/read_write.c:803 [inline] __se_sys_pwrite64 fs/read_write.c:800 [inline] __x64_sys_pwrite64+0x199/0x230 fs/read_write.c:800 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fb06359ce59 Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fb0643ff028 EFLAGS: 00000246 ORIG_RAX: 0000000000000012 RAX: ffffffffffffffda RBX: 00007fb063815fa0 RCX: 00007fb06359ce59 RDX: 000000000000fdef RSI: 0000200000000140 RDI: 0000000000000004 RBP: 00007fb063632d6f R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000c00 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fb063816038 R14: 00007fb063815fa0 R15: 00007ffe99cbba98 </TASK> *** BUG: Bad page state in iomap_write_begin tree: mm-new URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/akpm/mm.git base: 1ec3cca2d8b6b9ff6584ca626d4c8918bbf48d44 arch: amd64 compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8 config: https://ci.syzbot.org/builds/ffde37a3-aed0-4f49-bba1-ca31cd6a4b04/config syz repro: https://ci.syzbot.org/findings/8030d7fe-0d2e-4e47-ab50-b1211533d9c1/syz_repro XFS (loop0): Mounting V5 Filesystem d7dc424e-7990-42cb-9f91-9cb7200a101d XFS (loop0): Ending clean mount BUG: Bad page state in process syz.0.17 pfn:1a6481 page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x8081 pfn:0x1a6481 flags: 0x57ff20000000000(node=1|zone=2|lastcpupid=0x7ff) raw: 057ff20000000000 0000000000000000 00000000ffffffff 0000000000000000 raw: 0000000000008081 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set page_owner tracks the page as allocated page last allocated via order 0, migratetype Movable, gfp_mask 0x153c4a(GFP_NOFS|__GFP_HIGHMEM|__GFP_MOVABLE|__GFP_WRITE|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_HARDWALL), pid 5877, tgid 5876 (syz.0.17), ts 79178255762, free_ts 72347127723 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x22d/0x280 mm/page_alloc.c:1853 prep_new_page mm/page_alloc.c:1861 [inline] get_page_from_freelist+0x2593/0x2610 mm/page_alloc.c:3941 __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5221 alloc_pages_mpol+0x235/0x490 mm/mempolicy.c:2490 alloc_frozen_pages_noprof mm/mempolicy.c:2561 [inline] alloc_pages_noprof+0xac/0x2a0 mm/mempolicy.c:2581 folio_alloc_noprof+0x1e/0x30 mm/mempolicy.c:2591 filemap_alloc_folio_noprof+0x111/0x470 mm/filemap.c:1014 __filemap_get_folio_mpol+0x3fc/0xb00 mm/filemap.c:2012 __filemap_get_folio include/linux/pagemap.h:763 [inline] iomap_get_folio fs/iomap/buffered-io.c:725 [inline] __iomap_get_folio fs/iomap/buffered-io.c:896 [inline] iomap_write_begin+0x6d9/0x14f0 fs/iomap/buffered-io.c:960 iomap_write_iter fs/iomap/buffered-io.c:1144 [inline] iomap_file_buffered_write+0x47a/0xb30 fs/iomap/buffered-io.c:1225 xfs_file_buffered_write+0x212/0x8c0 fs/xfs/xfs_file.c:1056 new_sync_write fs/read_write.c:595 [inline] vfs_write+0x61d/0xb90 fs/read_write.c:688 ksys_pwrite64 fs/read_write.c:795 [inline] __do_sys_pwrite64 fs/read_write.c:803 [inline] __se_sys_pwrite64 fs/read_write.c:800 [inline] __x64_sys_pwrite64+0x199/0x230 fs/read_write.c:800 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f page last free pid 5710 tgid 5710 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] __free_pages_prepare mm/page_alloc.c:1397 [inline] free_unref_folios+0xd9f/0x14c0 mm/page_alloc.c:2999 folios_put_refs+0x9ff/0xb40 mm/swap.c:1008 free_pages_and_swap_cache+0x2b9/0x490 mm/swap_state.c:401 __tlb_batch_free_encoded_pages mm/mmu_gather.c:138 [inline] tlb_batch_pages_flush mm/mmu_gather.c:151 [inline] tlb_flush_mmu_free mm/mmu_gather.c:417 [inline] tlb_flush_mmu+0x6d3/0xa30 mm/mmu_gather.c:424 tlb_finish_mmu+0xf9/0x230 mm/mmu_gather.c:549 exit_mmap+0x498/0x9e0 mm/mmap.c:1313 __mmput+0x118/0x430 kernel/fork.c:1178 exit_mm+0x1f6/0x2d0 kernel/exit.c:582 do_exit+0x6a2/0x22c0 kernel/exit.c:964 do_group_exit+0x21b/0x2d0 kernel/exit.c:1119 get_signal+0x1284/0x1330 kernel/signal.c:3037 arch_do_signal_or_restart+0xbc/0x840 arch/x86/kernel/signal.c:337 __exit_to_user_mode_loop kernel/entry/common.c:64 [inline] exit_to_user_mode_loop+0xa9/0x680 kernel/entry/common.c:98 __exit_to_user_mode_prepare include/linux/irq-entry-common.h:207 [inline] syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:230 [inline] syscall_exit_to_user_mode include/linux/entry-common.h:318 [inline] do_syscall_64+0x353/0x580 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x77/0x7f Modules linked in: CPU: 1 UID: 0 PID: 5877 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120 bad_page+0x17f/0x1c0 mm/page_alloc.c:632 free_page_is_bad mm/page_alloc.c:1076 [inline] __free_pages_prepare mm/page_alloc.c:1388 [inline] __free_frozen_pages+0xcd9/0xd30 mm/page_alloc.c:2938 __folio_put+0x4a2/0x580 mm/swap.c:112 __folio_split+0xffe/0x1570 mm/huge_memory.c:4199 try_folio_split_to_order include/linux/huge_mm.h:411 [inline] try_folio_split_or_unmap+0x5b/0x1e0 mm/truncate.c:189 truncate_inode_partial_folio+0x4ab/0x8e0 mm/truncate.c:255 truncate_inode_pages_range+0x5f1/0xe30 mm/truncate.c:416 iomap_write_failed fs/iomap/buffered-io.c:785 [inline] iomap_write_iter fs/iomap/buffered-io.c:1187 [inline] iomap_file_buffered_write+0x788/0xb30 fs/iomap/buffered-io.c:1225 xfs_file_buffered_write+0x212/0x8c0 fs/xfs/xfs_file.c:1056 new_sync_write fs/read_write.c:595 [inline] vfs_write+0x61d/0xb90 fs/read_write.c:688 ksys_pwrite64 fs/read_write.c:795 [inline] __do_sys_pwrite64 fs/read_write.c:803 [inline] __se_sys_pwrite64 fs/read_write.c:800 [inline] __x64_sys_pwrite64+0x199/0x230 fs/read_write.c:800 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f3f0719ce59 Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f3f07ff7028 EFLAGS: 00000246 ORIG_RAX: 0000000000000012 RAX: ffffffffffffffda RBX: 00007f3f07415fa0 RCX: 00007f3f0719ce59 RDX: 00000000ffffffb7 RSI: 0000200000000040 RDI: 0000000000000004 RBP: 00007f3f07232d6f R08: 0000000000000000 R09: 0000000000000000 R10: 0000000008080c61 R11: 0000000000000246 R12: 0000000000000000 R13: 00007f3f07416038 R14: 00007f3f07415fa0 R15: 00007ffe415e9148 </TASK> BUG: Bad page state in process syz.0.17 pfn:1a6482 page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x8082 pfn:0x1a6482 head: order:0 mapcount:0 entire_mapcount:1 nr_pages_mapped:0 pincount:0 flags: 0x57ff20000000040(head|node=1|zone=2|lastcpupid=0x7ff) raw: 057ff20000000040 0000000000000000 ffffea0006992090 0000000000000000 raw: 0000000000008082 0000000000000000 00000000ffffffff 0000000000000000 head: 057ff20000000040 0000000000000000 ffffea0006992090 0000000000000000 head: 0000000000008082 0000000000000000 00000000ffffffff 0000000000000000 head: 057ff00000000000 0000000000000000 00000000ffffffff 0000000000000000 head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set page_owner tracks the page as allocated page last allocated via order 1, migratetype Movable, gfp_mask 0x153c4a(GFP_NOFS|__GFP_HIGHMEM|__GFP_MOVABLE|__GFP_WRITE|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_HARDWALL), pid 5877, tgid 5876 (syz.0.17), ts 79178255762, free_ts 72347116236 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x22d/0x280 mm/page_alloc.c:1853 prep_new_page mm/page_alloc.c:1861 [inline] get_page_from_freelist+0x2593/0x2610 mm/page_alloc.c:3941 __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5221 alloc_pages_mpol+0x235/0x490 mm/mempolicy.c:2490 alloc_frozen_pages_noprof mm/mempolicy.c:2561 [inline] alloc_pages_noprof+0xac/0x2a0 mm/mempolicy.c:2581 folio_alloc_noprof+0x1e/0x30 mm/mempolicy.c:2591 filemap_alloc_folio_noprof+0x111/0x470 mm/filemap.c:1014 __filemap_get_folio_mpol+0x3fc/0xb00 mm/filemap.c:2012 __filemap_get_folio include/linux/pagemap.h:763 [inline] iomap_get_folio fs/iomap/buffered-io.c:725 [inline] __iomap_get_folio fs/iomap/buffered-io.c:896 [inline] iomap_write_begin+0x6d9/0x14f0 fs/iomap/buffered-io.c:960 iomap_write_iter fs/iomap/buffered-io.c:1144 [inline] iomap_file_buffered_write+0x47a/0xb30 fs/iomap/buffered-io.c:1225 xfs_file_buffered_write+0x212/0x8c0 fs/xfs/xfs_file.c:1056 new_sync_write fs/read_write.c:595 [inline] vfs_write+0x61d/0xb90 fs/read_write.c:688 ksys_pwrite64 fs/read_write.c:795 [inline] __do_sys_pwrite64 fs/read_write.c:803 [inline] __se_sys_pwrite64 fs/read_write.c:800 [inline] __x64_sys_pwrite64+0x199/0x230 fs/read_write.c:800 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f page last free pid 5710 tgid 5710 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] __free_pages_prepare mm/page_alloc.c:1397 [inline] free_unref_folios+0xd9f/0x14c0 mm/page_alloc.c:2999 folios_put_refs+0x9ff/0xb40 mm/swap.c:1008 free_pages_and_swap_cache+0x2b9/0x490 mm/swap_state.c:401 __tlb_batch_free_encoded_pages mm/mmu_gather.c:138 [inline] tlb_batch_pages_flush mm/mmu_gather.c:151 [inline] tlb_flush_mmu_free mm/mmu_gather.c:417 [inline] tlb_flush_mmu+0x6d3/0xa30 mm/mmu_gather.c:424 tlb_finish_mmu+0xf9/0x230 mm/mmu_gather.c:549 exit_mmap+0x498/0x9e0 mm/mmap.c:1313 __mmput+0x118/0x430 kernel/fork.c:1178 exit_mm+0x1f6/0x2d0 kernel/exit.c:582 do_exit+0x6a2/0x22c0 kernel/exit.c:964 do_group_exit+0x21b/0x2d0 kernel/exit.c:1119 get_signal+0x1284/0x1330 kernel/signal.c:3037 arch_do_signal_or_restart+0xbc/0x840 arch/x86/kernel/signal.c:337 __exit_to_user_mode_loop kernel/entry/common.c:64 [inline] exit_to_user_mode_loop+0xa9/0x680 kernel/entry/common.c:98 __exit_to_user_mode_prepare include/linux/irq-entry-common.h:207 [inline] syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:230 [inline] syscall_exit_to_user_mode include/linux/entry-common.h:318 [inline] do_syscall_64+0x353/0x580 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x77/0x7f Modules linked in: CPU: 0 UID: 0 PID: 5877 Comm: syz.0.17 Tainted: G B syzkaller #0 PREEMPT(full) Tainted: [B]=BAD_PAGE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120 bad_page+0x17f/0x1c0 mm/page_alloc.c:632 free_page_is_bad mm/page_alloc.c:1076 [inline] __free_pages_prepare mm/page_alloc.c:1388 [inline] __free_frozen_pages+0xcd9/0xd30 mm/page_alloc.c:2938 __folio_put+0x4a2/0x580 mm/swap.c:112 __folio_split+0xffe/0x1570 mm/huge_memory.c:4199 try_folio_split_to_order include/linux/huge_mm.h:411 [inline] try_folio_split_or_unmap+0x5b/0x1e0 mm/truncate.c:189 truncate_inode_partial_folio+0x4ab/0x8e0 mm/truncate.c:255 truncate_inode_pages_range+0x5f1/0xe30 mm/truncate.c:416 iomap_write_failed fs/iomap/buffered-io.c:785 [inline] iomap_write_iter fs/iomap/buffered-io.c:1187 [inline] iomap_file_buffered_write+0x788/0xb30 fs/iomap/buffered-io.c:1225 xfs_file_buffered_write+0x212/0x8c0 fs/xfs/xfs_file.c:1056 new_sync_write fs/read_write.c:595 [inline] vfs_write+0x61d/0xb90 fs/read_write.c:688 ksys_pwrite64 fs/read_write.c:795 [inline] __do_sys_pwrite64 fs/read_write.c:803 [inline] __se_sys_pwrite64 fs/read_write.c:800 [inline] __x64_sys_pwrite64+0x199/0x230 fs/read_write.c:800 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f3f0719ce59 Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f3f07ff7028 EFLAGS: 00000246 ORIG_RAX: 0000000000000012 RAX: ffffffffffffffda RBX: 00007f3f07415fa0 RCX: 00007f3f0719ce59 RDX: 00000000ffffffb7 RSI: 0000200000000040 RDI: 0000000000000004 RBP: 00007f3f07232d6f R08: 0000000000000000 R09: 0000000000000000 R10: 0000000008080c61 R11: 0000000000000246 R12: 0000000000000000 R13: 00007f3f07416038 R14: 00007f3f07415fa0 R15: 00007ffe415e9148 </TASK> BUG: Bad page state in process syz.0.17 pfn:1a6484 page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x8084 pfn:0x1a6484 head: order:0 mapcount:0 entire_mapcount:1 nr_pages_mapped:0 pincount:0 flags: 0x57ff20000000040(head|node=1|zone=2|lastcpupid=0x7ff) raw: 057ff20000000040 0000000000000000 dead000000000122 0000000000000000 raw: 0000000000008084 0000000000000000 00000000ffffffff 0000000000000000 head: 057ff20000000040 0000000000000000 dead000000000122 0000000000000000 head: 0000000000008084 0000000000000000 00000000ffffffff 0000000000000000 head: 057ff00000000000 0000000000000000 00000000ffffffff 0000000000000000 head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set page_owner tracks the page as allocated page last allocated via order 2, migratetype Movable, gfp_mask 0x153c4a(GFP_NOFS|__GFP_HIGHMEM|__GFP_MOVABLE|__GFP_WRITE|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_HARDWALL), pid 5877, tgid 5876 (syz.0.17), ts 79178255762, free_ts 72347038008 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x22d/0x280 mm/page_alloc.c:1853 prep_new_page mm/page_alloc.c:1861 [inline] get_page_from_freelist+0x2593/0x2610 mm/page_alloc.c:3941 __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5221 alloc_pages_mpol+0x235/0x490 mm/mempolicy.c:2490 alloc_frozen_pages_noprof mm/mempolicy.c:2561 [inline] alloc_pages_noprof+0xac/0x2a0 mm/mempolicy.c:2581 folio_alloc_noprof+0x1e/0x30 mm/mempolicy.c:2591 filemap_alloc_folio_noprof+0x111/0x470 mm/filemap.c:1014 __filemap_get_folio_mpol+0x3fc/0xb00 mm/filemap.c:2012 __filemap_get_folio include/linux/pagemap.h:763 [inline] iomap_get_folio fs/iomap/buffered-io.c:725 [inline] __iomap_get_folio fs/iomap/buffered-io.c:896 [inline] iomap_write_begin+0x6d9/0x14f0 fs/iomap/buffered-io.c:960 iomap_write_iter fs/iomap/buffered-io.c:1144 [inline] iomap_file_buffered_write+0x47a/0xb30 fs/iomap/buffered-io.c:1225 xfs_file_buffered_write+0x212/0x8c0 fs/xfs/xfs_file.c:1056 new_sync_write fs/read_write.c:595 [inline] vfs_write+0x61d/0xb90 fs/read_write.c:688 ksys_pwrite64 fs/read_write.c:795 [inline] __do_sys_pwrite64 fs/read_write.c:803 [inline] __se_sys_pwrite64 fs/read_write.c:800 [inline] __x64_sys_pwrite64+0x199/0x230 fs/read_write.c:800 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f page last free pid 5710 tgid 5710 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] __free_pages_prepare mm/page_alloc.c:1397 [inline] free_unref_folios+0xd9f/0x14c0 mm/page_alloc.c:2999 folios_put_refs+0x9ff/0xb40 mm/swap.c:1008 free_pages_and_swap_cache+0x2b9/0x490 mm/swap_state.c:401 __tlb_batch_free_encoded_pages mm/mmu_gather.c:138 [inline] tlb_batch_pages_flush mm/mmu_gather.c:151 [inline] tlb_flush_mmu_free mm/mmu_gather.c:417 [inline] tlb_flush_mmu+0x6d3/0xa30 mm/mmu_gather.c:424 tlb_finish_mmu+0xf9/0x230 mm/mmu_gather.c:549 exit_mmap+0x498/0x9e0 mm/mmap.c:1313 __mmput+0x118/0x430 kernel/fork.c:1178 exit_mm+0x1f6/0x2d0 kernel/exit.c:582 do_exit+0x6a2/0x22c0 kernel/exit.c:964 do_group_exit+0x21b/0x2d0 kernel/exit.c:1119 get_signal+0x1284/0x1330 kernel/signal.c:3037 arch_do_signal_or_restart+0xbc/0x840 arch/x86/kernel/signal.c:337 __exit_to_user_mode_loop kernel/entry/common.c:64 [inline] exit_to_user_mode_loop+0xa9/0x680 kernel/entry/common.c:98 __exit_to_user_mode_prepare include/linux/irq-entry-common.h:207 [inline] syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:230 [inline] syscall_exit_to_user_mode include/linux/entry-common.h:318 [inline] do_syscall_64+0x353/0x580 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x77/0x7f Modules linked in: CPU: 0 UID: 0 PID: 5877 Comm: syz.0.17 Tainted: G B syzkaller #0 PREEMPT(full) Tainted: [B]=BAD_PAGE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120 bad_page+0x17f/0x1c0 mm/page_alloc.c:632 free_page_is_bad mm/page_alloc.c:1076 [inline] __free_pages_prepare mm/page_alloc.c:1388 [inline] __free_frozen_pages+0xcd9/0xd30 mm/page_alloc.c:2938 __folio_put+0x4a2/0x580 mm/swap.c:112 __folio_split+0xffe/0x1570 mm/huge_memory.c:4199 try_folio_split_to_order include/linux/huge_mm.h:411 [inline] try_folio_split_or_unmap+0x5b/0x1e0 mm/truncate.c:189 truncate_inode_partial_folio+0x4ab/0x8e0 mm/truncate.c:255 truncate_inode_pages_range+0x5f1/0xe30 mm/truncate.c:416 iomap_write_failed fs/iomap/buffered-io.c:785 [inline] iomap_write_iter fs/iomap/buffered-io.c:1187 [inline] iomap_file_buffered_write+0x788/0xb30 fs/iomap/buffered-io.c:1225 xfs_file_buffered_write+0x212/0x8c0 fs/xfs/xfs_file.c:1056 new_sync_write fs/read_write.c:595 [inline] vfs_write+0x61d/0xb90 fs/read_write.c:688 ksys_pwrite64 fs/read_write.c:795 [inline] __do_sys_pwrite64 fs/read_write.c:803 [inline] __se_sys_pwrite64 fs/read_write.c:800 [inline] __x64_sys_pwrite64+0x199/0x230 fs/read_write.c:800 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f3f0719ce59 Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f3f07ff7028 EFLAGS: 00000246 ORIG_RAX: 0000000000000012 RAX: ffffffffffffffda RBX: 00007f3f07415fa0 RCX: 00007f3f0719ce59 RDX: 00000000ffffffb7 RSI: 0000200000000040 RDI: 0000000000000004 RBP: 00007f3f07232d6f R08: 0000000000000000 R09: 0000000000000000 R10: 0000000008080c61 R11: 0000000000000246 R12: 0000000000000000 R13: 00007f3f07416038 R14: 00007f3f07415fa0 R15: 00007ffe415e9148 </TASK> BUG: Bad page state in process syz.0.17 pfn:1a6488 page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x8088 pfn:0x1a6488 head: order:0 mapcount:0 entire_mapcount:1 nr_pages_mapped:0 pincount:0 flags: 0x57ff20000000040(head|node=1|zone=2|lastcpupid=0x7ff) raw: 057ff20000000040 0000000000000000 dead000000000122 0000000000000000 raw: 0000000000008088 0000000000000000 00000000ffffffff 0000000000000000 head: 057ff20000000040 0000000000000000 dead000000000122 0000000000000000 head: 0000000000008088 0000000000000000 00000000ffffffff 0000000000000000 head: 057ff00000000000 0000000000000000 00000000ffffffff 0000000000000000 head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set page_owner tracks the page as allocated page last allocated via order 3, migratetype Movable, gfp_mask 0x153c4a(GFP_NOFS|__GFP_HIGHMEM|__GFP_MOVABLE|__GFP_WRITE|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_HARDWALL), pid 5877, tgid 5876 (syz.0.17), ts 79178255762, free_ts 72346997158 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x22d/0x280 mm/page_alloc.c:1853 prep_new_page mm/page_alloc.c:1861 [inline] get_page_from_freelist+0x2593/0x2610 mm/page_alloc.c:3941 __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5221 alloc_pages_mpol+0x235/0x490 mm/mempolicy.c:2490 alloc_frozen_pages_noprof mm/mempolicy.c:2561 [inline] alloc_pages_noprof+0xac/0x2a0 mm/mempolicy.c:2581 folio_alloc_noprof+0x1e/0x30 mm/mempolicy.c:2591 filemap_alloc_folio_noprof+0x111/0x470 mm/filemap.c:1014 __filemap_get_folio_mpol+0x3fc/0xb00 mm/filemap.c:2012 __filemap_get_folio include/linux/pagemap.h:763 [inline] iomap_get_folio fs/iomap/buffered-io.c:725 [inline] __iomap_get_folio fs/iomap/buffered-io.c:896 [inline] iomap_write_begin+0x6d9/0x14f0 fs/iomap/buffered-io.c:960 iomap_write_iter fs/iomap/buffered-io.c:1144 [inline] iomap_file_buffered_write+0x47a/0xb30 fs/iomap/buffered-io.c:1225 xfs_file_buffered_write+0x212/0x8c0 fs/xfs/xfs_file.c:1056 new_sync_write fs/read_write.c:595 [inline] vfs_write+0x61d/0xb90 fs/read_write.c:688 ksys_pwrite64 fs/read_write.c:795 [inline] __do_sys_pwrite64 fs/read_write.c:803 [inline] __se_sys_pwrite64 fs/read_write.c:800 [inline] __x64_sys_pwrite64+0x199/0x230 fs/read_write.c:800 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f page last free pid 5710 tgid 5710 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] __free_pages_prepare mm/page_alloc.c:1397 [inline] free_unref_folios+0xd9f/0x14c0 mm/page_alloc.c:2999 folios_put_refs+0x9ff/0xb40 mm/swap.c:1008 free_pages_and_swap_cache+0x2b9/0x490 mm/swap_state.c:401 __tlb_batch_free_encoded_pages mm/mmu_gather.c:138 [inline] tlb_batch_pages_flush mm/mmu_gather.c:151 [inline] tlb_flush_mmu_free mm/mmu_gather.c:417 [inline] tlb_flush_mmu+0x6d3/0xa30 mm/mmu_gather.c:424 tlb_finish_mmu+0xf9/0x230 mm/mmu_gather.c:549 exit_mmap+0x498/0x9e0 mm/mmap.c:1313 __mmput+0x118/0x430 kernel/fork.c:1178 exit_mm+0x1f6/0x2d0 kernel/exit.c:582 do_exit+0x6a2/0x22c0 kernel/exit.c:964 do_group_exit+0x21b/0x2d0 kernel/exit.c:1119 get_signal+0x1284/0x1330 kernel/signal.c:3037 arch_do_signal_or_restart+0xbc/0x840 arch/x86/kernel/signal.c:337 __exit_to_user_mode_loop kernel/entry/common.c:64 [inline] exit_to_user_mode_loop+0xa9/0x680 kernel/entry/common.c:98 __exit_to_user_mode_prepare include/linux/irq-entry-common.h:207 [inline] syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:230 [inline] syscall_exit_to_user_mode include/linux/entry-common.h:318 [inline] do_syscall_64+0x353/0x580 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x77/0x7f Modules linked in: CPU: 1 UID: 0 PID: 5877 Comm: syz.0.17 Tainted: G B syzkaller #0 PREEMPT(full) Tainted: [B]=BAD_PAGE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120 bad_page+0x17f/0x1c0 mm/page_alloc.c:632 free_page_is_bad mm/page_alloc.c:1076 [inline] __free_pages_prepare mm/page_alloc.c:1388 [inline] __free_frozen_pages+0xcd9/0xd30 mm/page_alloc.c:2938 __folio_put+0x4a2/0x580 mm/swap.c:112 __folio_split+0xffe/0x1570 mm/huge_memory.c:4199 try_folio_split_to_order include/linux/huge_mm.h:411 [inline] try_folio_split_or_unmap+0x5b/0x1e0 mm/truncate.c:189 truncate_inode_partial_folio+0x4ab/0x8e0 mm/truncate.c:255 truncate_inode_pages_range+0x5f1/0xe30 mm/truncate.c:416 iomap_write_failed fs/iomap/buffered-io.c:785 [inline] iomap_write_iter fs/iomap/buffered-io.c:1187 [inline] iomap_file_buffered_write+0x788/0xb30 fs/iomap/buffered-io.c:1225 xfs_file_buffered_write+0x212/0x8c0 fs/xfs/xfs_file.c:1056 new_sync_write fs/read_write.c:595 [inline] vfs_write+0x61d/0xb90 fs/read_write.c:688 ksys_pwrite64 fs/read_write.c:795 [inline] __do_sys_pwrite64 fs/read_write.c:803 [inline] __se_sys_pwrite64 fs/read_write.c:800 [inline] __x64_sys_pwrite64+0x199/0x230 fs/read_write.c:800 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f3f0719ce59 Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f3f07ff7028 EFLAGS: 00000246 ORIG_RAX: 0000000000000012 RAX: ffffffffffffffda RBX: 00007f3f07415fa0 RCX: 00007f3f0719ce59 RDX: 00000000ffffffb7 RSI: 0000200000000040 RDI: 0000000000000004 RBP: 00007f3f07232d6f R08: 0000000000000000 R09: 0000000000000000 R10: 0000000008080c61 R11: 0000000000000246 R12: 0000000000000000 R13: 00007f3f07416038 R14: 00007f3f07415fa0 R15: 00007ffe415e9148 </TASK> BUG: Bad page state in process syz.0.17 pfn:1a6490 page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x8090 pfn:0x1a6490 head: order:0 mapcount:0 entire_mapcount:1 nr_pages_mapped:0 pincount:0 flags: 0x57ff20000000040(head|node=1|zone=2|lastcpupid=0x7ff) raw: 057ff20000000040 0000000000000000 dead000000000122 0000000000000000 raw: 0000000000008090 0000000000000000 00000000ffffffff 0000000000000000 head: 057ff20000000040 0000000000000000 dead000000000122 0000000000000000 head: 0000000000008090 0000000000000000 00000000ffffffff 0000000000000000 head: 057ff00000000000 0000000000000000 00000000ffffffff 0000000000000000 head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set page_owner tracks the page as allocated page last allocated via order 4, migratetype Movable, gfp_mask 0x153c4a(GFP_NOFS|__GFP_HIGHMEM|__GFP_MOVABLE|__GFP_WRITE|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_HARDWALL), pid 5877, tgid 5876 (syz.0.17), ts 79178255762, free_ts 72346919466 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x22d/0x280 mm/page_alloc.c:1853 prep_new_page mm/page_alloc.c:1861 [inline] get_page_from_freelist+0x2593/0x2610 mm/page_alloc.c:3941 __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5221 alloc_pages_mpol+0x235/0x490 mm/mempolicy.c:2490 alloc_frozen_pages_noprof mm/mempolicy.c:2561 [inline] alloc_pages_noprof+0xac/0x2a0 mm/mempolicy.c:2581 folio_alloc_noprof+0x1e/0x30 mm/mempolicy.c:2591 filemap_alloc_folio_noprof+0x111/0x470 mm/filemap.c:1014 __filemap_get_folio_mpol+0x3fc/0xb00 mm/filemap.c:2012 __filemap_get_folio include/linux/pagemap.h:763 [inline] iomap_get_folio fs/iomap/buffered-io.c:725 [inline] __iomap_get_folio fs/iomap/buffered-io.c:896 [inline] iomap_write_begin+0x6d9/0x14f0 fs/iomap/buffered-io.c:960 iomap_write_iter fs/iomap/buffered-io.c:1144 [inline] iomap_file_buffered_write+0x47a/0xb30 fs/iomap/buffered-io.c:1225 xfs_file_buffered_write+0x212/0x8c0 fs/xfs/xfs_file.c:1056 new_sync_write fs/read_write.c:595 [inline] vfs_write+0x61d/0xb90 fs/read_write.c:688 ksys_pwrite64 fs/read_write.c:795 [inline] __do_sys_pwrite64 fs/read_write.c:803 [inline] __se_sys_pwrite64 fs/read_write.c:800 [inline] __x64_sys_pwrite64+0x199/0x230 fs/read_write.c:800 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f page last free pid 5710 tgid 5710 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] __free_pages_prepare mm/page_alloc.c:1397 [inline] free_unref_folios+0xd9f/0x14c0 mm/page_alloc.c:2999 folios_put_refs+0x9ff/0xb40 mm/swap.c:1008 free_pages_and_swap_cache+0x2b9/0x490 mm/swap_state.c:401 __tlb_batch_free_encoded_pages mm/mmu_gather.c:138 [inline] tlb_batch_pages_flush mm/mmu_gather.c:151 [inline] tlb_flush_mmu_free mm/mmu_gather.c:417 [inline] tlb_flush_mmu+0x6d3/0xa30 mm/mmu_gather.c:424 tlb_finish_mmu+0xf9/0x230 mm/mmu_gather.c:549 exit_mmap+0x498/0x9e0 mm/mmap.c:1313 __mmput+0x118/0x430 kernel/fork.c:1178 exit_mm+0x1f6/0x2d0 kernel/exit.c:582 do_exit+0x6a2/0x22c0 kernel/exit.c:964 do_group_exit+0x21b/0x2d0 kernel/exit.c:1119 get_signal+0x1284/0x1330 kernel/signal.c:3037 arch_do_signal_or_restart+0xbc/0x840 arch/x86/kernel/signal.c:337 __exit_to_user_mode_loop kernel/entry/common.c:64 [inline] exit_to_user_mode_loop+0xa9/0x680 kernel/entry/common.c:98 __exit_to_user_mode_prepare include/linux/irq-entry-common.h:207 [inline] syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:230 [inline] syscall_exit_to_user_mode include/linux/entry-common.h:318 [inline] do_syscall_64+0x353/0x580 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x77/0x7f Modules linked in: CPU: 1 UID: 0 PID: 5877 Comm: syz.0.17 Tainted: G B syzkaller #0 PREEMPT(full) Tainted: [B]=BAD_PAGE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120 bad_page+0x17f/0x1c0 mm/page_alloc.c:632 free_page_is_bad mm/page_alloc.c:1076 [inline] __free_pages_prepare mm/page_alloc.c:1388 [inline] __free_pages_ok+0xb8c/0xbd0 mm/page_alloc.c:1578 __folio_put+0x4a2/0x580 mm/swap.c:112 __folio_split+0xffe/0x1570 mm/huge_memory.c:4199 try_folio_split_to_order include/linux/huge_mm.h:411 [inline] try_folio_split_or_unmap+0x5b/0x1e0 mm/truncate.c:189 truncate_inode_partial_folio+0x4ab/0x8e0 mm/truncate.c:255 truncate_inode_pages_range+0x5f1/0xe30 mm/truncate.c:416 iomap_write_failed fs/iomap/buffered-io.c:785 [inline] iomap_write_iter fs/iomap/buffered-io.c:1187 [inline] iomap_file_buffered_write+0x788/0xb30 fs/iomap/buffered-io.c:1225 xfs_file_buffered_write+0x212/0x8c0 fs/xfs/xfs_file.c:1056 new_sync_write fs/read_write.c:595 [inline] vfs_write+0x61d/0xb90 fs/read_write.c:688 ksys_pwrite64 fs/read_write.c:795 [inline] __do_sys_pwrite64 fs/read_write.c:803 [inline] __se_sys_pwrite64 fs/read_write.c:800 [inline] __x64_sys_pwrite64+0x199/0x230 fs/read_write.c:800 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f3f0719ce59 Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f3f07ff7028 EFLAGS: 00000246 ORIG_RAX: 0000000000000012 RAX: ffffffffffffffda RBX: 00007f3f07415fa0 RCX: 00007f3f0719ce59 RDX: 00000000ffffffb7 RSI: 0000200000000040 RDI: 0000000000000004 RBP: 00007f3f07232d6f R08: 0000000000000000 R09: 0000000000000000 R10: 0000000008080c61 R11: 0000000000000246 R12: 0000000000000000 R13: 00007f3f07416038 R14: 00007f3f07415fa0 R15: 00007ffe415e9148 </TASK> BUG: Bad page state in process syz.0.17 pfn:1a64a0 page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x80a0 pfn:0x1a64a0 head: order:0 mapcount:0 entire_mapcount:1 nr_pages_mapped:0 pincount:0 flags: 0x57ff20000000040(head|node=1|zone=2|lastcpupid=0x7ff) raw: 057ff20000000040 0000000000000000 dead000000000122 0000000000000000 raw: 00000000000080a0 0000000000000000 00000000ffffffff 0000000000000000 head: 057ff20000000040 0000000000000000 dead000000000122 0000000000000000 head: 00000000000080a0 0000000000000000 00000000ffffffff 0000000000000000 head: 057ff00000000000 0000000000000000 00000000ffffffff 0000000000000000 head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set page_owner tracks the page as allocated page last allocated via order 5, migratetype Movable, gfp_mask 0x153c4a(GFP_NOFS|__GFP_HIGHMEM|__GFP_MOVABLE|__GFP_WRITE|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_HARDWALL), pid 5877, tgid 5876 (syz.0.17), ts 79178255762, free_ts 72346647882 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x22d/0x280 mm/page_alloc.c:1853 prep_new_page mm/page_alloc.c:1861 [inline] get_page_from_freelist+0x2593/0x2610 mm/page_alloc.c:3941 __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5221 alloc_pages_mpol+0x235/0x490 mm/mempolicy.c:2490 alloc_frozen_pages_noprof mm/mempolicy.c:2561 [inline] alloc_pages_noprof+0xac/0x2a0 mm/mempolicy.c:2581 folio_alloc_noprof+0x1e/0x30 mm/mempolicy.c:2591 filemap_alloc_folio_noprof+0x111/0x470 mm/filemap.c:1014 __filemap_get_folio_mpol+0x3fc/0xb00 mm/filemap.c:2012 __filemap_get_folio include/linux/pagemap.h:763 [inline] iomap_get_folio fs/iomap/buffered-io.c:725 [inline] __iomap_get_folio fs/iomap/buffered-io.c:896 [inline] iomap_write_begin+0x6d9/0x14f0 fs/iomap/buffered-io.c:960 iomap_write_iter fs/iomap/buffered-io.c:1144 [inline] iomap_file_buffered_write+0x47a/0xb30 fs/iomap/buffered-io.c:1225 xfs_file_buffered_write+0x212/0x8c0 fs/xfs/xfs_file.c:1056 new_sync_write fs/read_write.c:595 [inline] vfs_write+0x61d/0xb90 fs/read_write.c:688 ksys_pwrite64 fs/read_write.c:795 [inline] __do_sys_pwrite64 fs/read_write.c:803 [inline] __se_sys_pwrite64 fs/read_write.c:800 [inline] __x64_sys_pwrite64+0x199/0x230 fs/read_write.c:800 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f page last free pid 5710 tgid 5710 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] __free_pages_prepare mm/page_alloc.c:1397 [inline] free_unref_folios+0xd9f/0x14c0 mm/page_alloc.c:2999 folios_put_refs+0x9ff/0xb40 mm/swap.c:1008 free_pages_and_swap_cache+0x2b9/0x490 mm/swap_state.c:401 __tlb_batch_free_encoded_pages mm/mmu_gather.c:138 [inline] tlb_batch_pages_flush mm/mmu_gather.c:151 [inline] tlb_flush_mmu_free mm/mmu_gather.c:417 [inline] tlb_flush_mmu+0x6d3/0xa30 mm/mmu_gather.c:424 tlb_finish_mmu+0xf9/0x230 mm/mmu_gather.c:549 exit_mmap+0x498/0x9e0 mm/mmap.c:1313 __mmput+0x118/0x430 kernel/fork.c:1178 exit_mm+0x1f6/0x2d0 kernel/exit.c:582 do_exit+0x6a2/0x22c0 kernel/exit.c:964 do_group_exit+0x21b/0x2d0 kernel/exit.c:1119 get_signal+0x1284/0x1330 kernel/signal.c:3037 arch_do_signal_or_restart+0xbc/0x840 arch/x86/kernel/signal.c:337 __exit_to_user_mode_loop kernel/entry/common.c:64 [inline] exit_to_user_mode_loop+0xa9/0x680 kernel/entry/common.c:98 __exit_to_user_mode_prepare include/linux/irq-entry-common.h:207 [inline] syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:230 [inline] syscall_exit_to_user_mode include/linux/entry-common.h:318 [inline] do_syscall_64+0x353/0x580 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x77/0x7f Modules linked in: CPU: 0 UID: 0 PID: 5877 Comm: syz.0.17 Tainted: G B syzkaller #0 PREEMPT(full) Tainted: [B]=BAD_PAGE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120 bad_page+0x17f/0x1c0 mm/page_alloc.c:632 free_page_is_bad mm/page_alloc.c:1076 [inline] __free_pages_prepare mm/page_alloc.c:1388 [inline] __free_pages_ok+0xb8c/0xbd0 mm/page_alloc.c:1578 __folio_put+0x4a2/0x580 mm/swap.c:112 __folio_split+0xffe/0x1570 mm/huge_memory.c:4199 try_folio_split_to_order include/linux/huge_mm.h:411 [inline] try_folio_split_or_unmap+0x5b/0x1e0 mm/truncate.c:189 truncate_inode_partial_folio+0x4ab/0x8e0 mm/truncate.c:255 truncate_inode_pages_range+0x5f1/0xe30 mm/truncate.c:416 iomap_write_failed fs/iomap/buffered-io.c:785 [inline] iomap_write_iter fs/iomap/buffered-io.c:1187 [inline] iomap_file_buffered_write+0x788/0xb30 fs/iomap/buffered-io.c:1225 xfs_file_buffered_write+0x212/0x8c0 fs/xfs/xfs_file.c:1056 new_sync_write fs/read_write.c:595 [inline] vfs_write+0x61d/0xb90 fs/read_write.c:688 ksys_pwrite64 fs/read_write.c:795 [inline] __do_sys_pwrite64 fs/read_write.c:803 [inline] __se_sys_pwrite64 fs/read_write.c:800 [inline] __x64_sys_pwrite64+0x199/0x230 fs/read_write.c:800 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f3f0719ce59 Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f3f07ff7028 EFLAGS: 00000246 ORIG_RAX: 0000000000000012 RAX: ffffffffffffffda RBX: 00007f3f07415fa0 RCX: 00007f3f0719ce59 RDX: 00000000ffffffb7 RSI: 0000200000000040 RDI: 0000000000000004 RBP: 00007f3f07232d6f R08: 0000000000000000 R09: 0000000000000000 R10: 0000000008080c61 R11: 0000000000000246 R12: 0000000000000000 R13: 00007f3f07416038 R14: 00007f3f07415fa0 R15: 00007ffe415e9148 </TASK> BUG: Bad page state in process syz.0.17 pfn:1a64c0 page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x80c0 pfn:0x1a64c0 head: order:0 mapcount:0 entire_mapcount:1 nr_pages_mapped:0 pincount:0 flags: 0x57ff20000000040(head|node=1|zone=2|lastcpupid=0x7ff) raw: 057ff20000000040 0000000000000000 dead000000000122 0000000000000000 raw: 00000000000080c0 0000000000000000 00000000ffffffff 0000000000000000 head: 057ff20000000040 0000000000000000 dead000000000122 0000000000000000 head: 00000000000080c0 0000000000000000 00000000ffffffff 0000000000000000 head: 057ff00000000000 0000000000000000 00000000ffffffff 0000000000000000 head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set page_owner tracks the page as allocated page last allocated via order 6, migratetype Movable, gfp_mask 0x153c4a(GFP_NOFS|__GFP_HIGHMEM|__GFP_MOVABLE|__GFP_WRITE|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_HARDWALL), pid 5877, tgid 5876 (syz.0.17), ts 79178255762, free_ts 72346190117 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x22d/0x280 mm/page_alloc.c:1853 prep_new_page mm/page_alloc.c:1861 [inline] get_page_from_freelist+0x2593/0x2610 mm/page_alloc.c:3941 __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5221 alloc_pages_mpol+0x235/0x490 mm/mempolicy.c:2490 alloc_frozen_pages_noprof mm/mempolicy.c:2561 [inline] alloc_pages_noprof+0xac/0x2a0 mm/mempolicy.c:2581 folio_alloc_noprof+0x1e/0x30 mm/mempolicy.c:2591 filemap_alloc_folio_noprof+0x111/0x470 mm/filemap.c:1014 __filemap_get_folio_mpol+0x3fc/0xb00 mm/filemap.c:2012 __filemap_get_folio include/linux/pagemap.h:763 [inline] iomap_get_folio fs/iomap/buffered-io.c:725 [inline] __iomap_get_folio fs/iomap/buffered-io.c:896 [inline] iomap_write_begin+0x6d9/0x14f0 fs/iomap/buffered-io.c:960 iomap_write_iter fs/iomap/buffered-io.c:1144 [inline] iomap_file_buffered_write+0x47a/0xb30 fs/iomap/buffered-io.c:1225 xfs_file_buffered_write+0x212/0x8c0 fs/xfs/xfs_file.c:1056 new_sync_write fs/read_write.c:595 [inline] vfs_write+0x61d/0xb90 fs/read_write.c:688 ksys_pwrite64 fs/read_write.c:795 [inline] __do_sys_pwrite64 fs/read_write.c:803 [inline] __se_sys_pwrite64 fs/read_write.c:800 [inline] __x64_sys_pwrite64+0x199/0x230 fs/read_write.c:800 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f page last free pid 5710 tgid 5710 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] __free_pages_prepare mm/page_alloc.c:1397 [inline] free_unref_folios+0xd9f/0x14c0 mm/page_alloc.c:2999 folios_put_refs+0x9ff/0xb40 mm/swap.c:1008 free_pages_and_swap_cache+0x2b9/0x490 mm/swap_state.c:401 __tlb_batch_free_encoded_pages mm/mmu_gather.c:138 [inline] tlb_batch_pages_flush mm/mmu_gather.c:151 [inline] tlb_flush_mmu_free mm/mmu_gather.c:417 [inline] tlb_flush_mmu+0x6d3/0xa30 mm/mmu_gather.c:424 tlb_finish_mmu+0xf9/0x230 mm/mmu_gather.c:549 exit_mmap+0x498/0x9e0 mm/mmap.c:1313 __mmput+0x118/0x430 kernel/fork.c:1178 exit_mm+0x1f6/0x2d0 kernel/exit.c:582 do_exit+0x6a2/0x22c0 kernel/exit.c:964 do_group_exit+0x21b/0x2d0 kernel/exit.c:1119 get_signal+0x1284/0x1330 kernel/signal.c:3037 arch_do_signal_or_restart+0xbc/0x840 arch/x86/kernel/signal.c:337 __exit_to_user_mode_loop kernel/entry/common.c:64 [inline] exit_to_user_mode_loop+0xa9/0x680 kernel/entry/common.c:98 __exit_to_user_mode_prepare include/linux/irq-entry-common.h:207 [inline] syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:230 [inline] syscall_exit_to_user_mode include/linux/entry-common.h:318 [inline] do_syscall_64+0x353/0x580 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x77/0x7f Modules linked in: CPU: 0 UID: 0 PID: 5877 Comm: syz.0.17 Tainted: G B syzkaller #0 PREEMPT(full) Tainted: [B]=BAD_PAGE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120 bad_page+0x17f/0x1c0 mm/page_alloc.c:632 free_page_is_bad mm/page_alloc.c:1076 [inline] __free_pages_prepare mm/page_alloc.c:1388 [inline] __free_pages_ok+0xb8c/0xbd0 mm/page_alloc.c:1578 __folio_put+0x4a2/0x580 mm/swap.c:112 __folio_split+0xffe/0x1570 mm/huge_memory.c:4199 try_folio_split_to_order include/linux/huge_mm.h:411 [inline] try_folio_split_or_unmap+0x5b/0x1e0 mm/truncate.c:189 truncate_inode_partial_folio+0x4ab/0x8e0 mm/truncate.c:255 truncate_inode_pages_range+0x5f1/0xe30 mm/truncate.c:416 iomap_write_failed fs/iomap/buffered-io.c:785 [inline] iomap_write_iter fs/iomap/buffered-io.c:1187 [inline] iomap_file_buffered_write+0x788/0xb30 fs/iomap/buffered-io.c:1225 xfs_file_buffered_write+0x212/0x8c0 fs/xfs/xfs_file.c:1056 new_sync_write fs/read_write.c:595 [inline] vfs_write+0x61d/0xb90 fs/read_write.c:688 ksys_pwrite64 fs/read_write.c:795 [inline] __do_sys_pwrite64 fs/read_write.c:803 [inline] __se_sys_pwrite64 fs/read_write.c:800 [inline] __x64_sys_pwrite64+0x199/0x230 fs/read_write.c:800 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f3f0719ce59 Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f3f07ff7028 EFLAGS: 00000246 ORIG_RAX: 0000000000000012 RAX: ffffffffffffffda RBX: 00007f3f07415fa0 RCX: 00007f3f0719ce59 RDX: 00000000ffffffb7 RSI: 0000200000000040 RDI: 0000000000000004 RBP: 00007f3f07232d6f R08: 0000000000000000 R09: 0000000000000000 R10: 0000000008080c61 R11: 0000000000000246 R12: 0000000000000000 R13: 00007f3f07416038 R14: 00007f3f07415fa0 R15: 00007ffe415e9148 </TASK> *** BUG: Bad page state in shmem_get_folio_gfp tree: mm-new URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/akpm/mm.git base: 1ec3cca2d8b6b9ff6584ca626d4c8918bbf48d44 arch: amd64 compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8 config: https://ci.syzbot.org/builds/ffde37a3-aed0-4f49-bba1-ca31cd6a4b04/config syz repro: https://ci.syzbot.org/findings/f40ca5d2-8fd7-4dbe-a861-a7c4a5f442dd/syz_repro BUG: Bad page state in process syz.0.53 pfn:11ea80 page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x680 pfn:0x11ea80 head: order:0 mapcount:0 entire_mapcount:1 nr_pages_mapped:0 pincount:0 flags: 0x17ff7800002025c(referenced|uptodate|dirty|workingset|head|swapbacked|node=0|zone=2|lastcpupid=0x7ff) raw: 017ff7800002025c 0000000000000000 dead000000000122 0000000000000000 raw: 0000000000000680 0000000000000000 00000000ffffffff 0000000000000000 head: 017ff7800002025c 0000000000000000 dead000000000122 0000000000000000 head: 0000000000000680 0000000000000000 00000000ffffffff 0000000000000000 head: 017ff00000000000 0000000000000000 00000000ffffffff 0000000000000000 head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set page_owner tracks the page as allocated page last allocated via order 7, migratetype Movable, gfp_mask 0x3d20ca(GFP_TRANSHUGE_LIGHT|__GFP_NORETRY|__GFP_THISNODE), pid 5990, tgid 5988 (syz.0.53), ts 80487329937, free_ts 80461370315 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x22d/0x280 mm/page_alloc.c:1853 prep_new_page mm/page_alloc.c:1861 [inline] get_page_from_freelist+0x2593/0x2610 mm/page_alloc.c:3941 __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5221 alloc_pages_mpol+0x1da/0x490 mm/mempolicy.c:2476 folio_alloc_mpol_noprof+0x39/0x160 mm/mempolicy.c:2509 shmem_alloc_folio+0xba/0x160 mm/shmem.c:1933 shmem_alloc_and_add_folio+0x62f/0xf80 mm/shmem.c:1962 shmem_get_folio_gfp+0x555/0x1670 mm/shmem.c:2552 shmem_get_folio mm/shmem.c:2670 [inline] shmem_write_begin+0x16c/0x330 mm/shmem.c:3303 generic_perform_write+0x2e2/0x8f0 mm/filemap.c:4325 shmem_file_write_iter+0xf8/0x120 mm/shmem.c:3478 new_sync_write fs/read_write.c:595 [inline] vfs_write+0x61d/0xb90 fs/read_write.c:688 ksys_write+0x150/0x270 fs/read_write.c:740 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f page last free pid 5749 tgid 5749 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] __free_pages_prepare mm/page_alloc.c:1397 [inline] free_unref_folios+0xd9f/0x14c0 mm/page_alloc.c:2999 folios_put_refs+0x9ff/0xb40 mm/swap.c:1008 folio_batch_release include/linux/folio_batch.h:101 [inline] shmem_undo_range+0x52c/0x1660 mm/shmem.c:1149 shmem_truncate_range mm/shmem.c:1277 [inline] shmem_evict_inode+0x289/0xae0 mm/shmem.c:1407 evict+0x61e/0xb10 fs/inode.c:841 __dentry_kill+0x1a2/0x690 fs/dcache.c:718 shrink_kill+0xa9/0x2c0 fs/dcache.c:1195 shrink_dentry_list+0x2e0/0x5e0 fs/dcache.c:1222 shrink_dcache_tree+0xe9/0x5d0 fs/dcache.c:-1 do_one_tree fs/dcache.c:1721 [inline] shrink_dcache_for_umount+0xa8/0x1f0 fs/dcache.c:1738 generic_shutdown_super+0x6f/0x2d0 fs/super.c:624 kill_anon_super+0x3b/0x70 fs/super.c:1292 deactivate_locked_super+0xbc/0x130 fs/super.c:476 cleanup_mnt+0x437/0x4d0 fs/namespace.c:1312 task_work_run+0x1d9/0x270 kernel/task_work.c:233 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] __exit_to_user_mode_loop kernel/entry/common.c:67 [inline] exit_to_user_mode_loop+0x193/0x680 kernel/entry/common.c:98 Modules linked in: CPU: 0 UID: 0 PID: 5990 Comm: syz.0.53 Not tainted syzkaller #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120 bad_page+0x17f/0x1c0 mm/page_alloc.c:632 free_page_is_bad mm/page_alloc.c:1076 [inline] __free_pages_prepare mm/page_alloc.c:1388 [inline] __free_pages_ok+0xb8c/0xbd0 mm/page_alloc.c:1578 __folio_put+0x4a2/0x580 mm/swap.c:112 __folio_split+0xffe/0x1570 mm/huge_memory.c:4199 try_folio_split_to_order include/linux/huge_mm.h:411 [inline] try_folio_split_or_unmap+0x5b/0x1e0 mm/truncate.c:189 truncate_inode_partial_folio+0x4ab/0x8e0 mm/truncate.c:255 shmem_undo_range+0x9a2/0x1660 mm/shmem.c:1181 shmem_truncate_range mm/shmem.c:1277 [inline] shmem_fallocate+0x51c/0xec0 mm/shmem.c:3703 vfs_fallocate+0x669/0x7e0 fs/open.c:338 madvise_remove mm/madvise.c:1039 [inline] madvise_vma_behavior+0x2bc8/0x4300 mm/madvise.c:1352 madvise_walk_vmas+0x573/0xae0 mm/madvise.c:1713 madvise_do_behavior+0x386/0x540 mm/madvise.c:1929 do_madvise+0x1fa/0x2e0 mm/madvise.c:2022 __do_sys_madvise mm/madvise.c:2031 [inline] __se_sys_madvise mm/madvise.c:2029 [inline] __x64_sys_madvise+0xa6/0xc0 mm/madvise.c:2029 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fc37db9ce59 Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fc37ead0028 EFLAGS: 00000246 ORIG_RAX: 000000000000001c RAX: ffffffffffffffda RBX: 00007fc37de15fa0 RCX: 00007fc37db9ce59 RDX: 0000000000000009 RSI: 0000000000600003 RDI: 0000200000000000 RBP: 00007fc37dc32d6f R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fc37de16038 R14: 00007fc37de15fa0 R15: 00007fff07d58848 </TASK> BUG: Bad page state in process syz.0.53 pfn:11eb00 page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x700 pfn:0x11eb00 head: order:0 mapcount:0 entire_mapcount:1 nr_pages_mapped:0 pincount:0 flags: 0x17ff7800002025c(referenced|uptodate|dirty|workingset|head|swapbacked|node=0|zone=2|lastcpupid=0x7ff) raw: 017ff7800002025c 0000000000000000 dead000000000122 0000000000000000 raw: 0000000000000700 0000000000000000 00000000ffffffff 0000000000000000 head: 017ff7800002025c 0000000000000000 dead000000000122 0000000000000000 head: 0000000000000700 0000000000000000 00000000ffffffff 0000000000000000 head: 017ff00000000000 0000000000000000 00000000ffffffff 0000000000000000 head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set page_owner tracks the page as allocated page last allocated via order 8, migratetype Movable, gfp_mask 0x3d20ca(GFP_TRANSHUGE_LIGHT|__GFP_NORETRY|__GFP_THISNODE), pid 5990, tgid 5988 (syz.0.53), ts 80487329937, free_ts 80461370315 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x22d/0x280 mm/page_alloc.c:1853 prep_new_page mm/page_alloc.c:1861 [inline] get_page_from_freelist+0x2593/0x2610 mm/page_alloc.c:3941 __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5221 alloc_pages_mpol+0x1da/0x490 mm/mempolicy.c:2476 folio_alloc_mpol_noprof+0x39/0x160 mm/mempolicy.c:2509 shmem_alloc_folio+0xba/0x160 mm/shmem.c:1933 shmem_alloc_and_add_folio+0x62f/0xf80 mm/shmem.c:1962 shmem_get_folio_gfp+0x555/0x1670 mm/shmem.c:2552 shmem_get_folio mm/shmem.c:2670 [inline] shmem_write_begin+0x16c/0x330 mm/shmem.c:3303 generic_perform_write+0x2e2/0x8f0 mm/filemap.c:4325 shmem_file_write_iter+0xf8/0x120 mm/shmem.c:3478 new_sync_write fs/read_write.c:595 [inline] vfs_write+0x61d/0xb90 fs/read_write.c:688 ksys_write+0x150/0x270 fs/read_write.c:740 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f page last free pid 5749 tgid 5749 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] __free_pages_prepare mm/page_alloc.c:1397 [inline] free_unref_folios+0xd9f/0x14c0 mm/page_alloc.c:2999 folios_put_refs+0x9ff/0xb40 mm/swap.c:1008 folio_batch_release include/linux/folio_batch.h:101 [inline] shmem_undo_range+0x52c/0x1660 mm/shmem.c:1149 shmem_truncate_range mm/shmem.c:1277 [inline] shmem_evict_inode+0x289/0xae0 mm/shmem.c:1407 evict+0x61e/0xb10 fs/inode.c:841 __dentry_kill+0x1a2/0x690 fs/dcache.c:718 shrink_kill+0xa9/0x2c0 fs/dcache.c:1195 shrink_dentry_list+0x2e0/0x5e0 fs/dcache.c:1222 shrink_dcache_tree+0xe9/0x5d0 fs/dcache.c:-1 do_one_tree fs/dcache.c:1721 [inline] shrink_dcache_for_umount+0xa8/0x1f0 fs/dcache.c:1738 generic_shutdown_super+0x6f/0x2d0 fs/super.c:624 kill_anon_super+0x3b/0x70 fs/super.c:1292 deactivate_locked_super+0xbc/0x130 fs/super.c:476 cleanup_mnt+0x437/0x4d0 fs/namespace.c:1312 task_work_run+0x1d9/0x270 kernel/task_work.c:233 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] __exit_to_user_mode_loop kernel/entry/common.c:67 [inline] exit_to_user_mode_loop+0x193/0x680 kernel/entry/common.c:98 Modules linked in: CPU: 0 UID: 0 PID: 5990 Comm: syz.0.53 Tainted: G B syzkaller #0 PREEMPT(full) Tainted: [B]=BAD_PAGE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120 bad_page+0x17f/0x1c0 mm/page_alloc.c:632 free_page_is_bad mm/page_alloc.c:1076 [inline] __free_pages_prepare mm/page_alloc.c:1388 [inline] __free_pages_ok+0xb8c/0xbd0 mm/page_alloc.c:1578 __folio_put+0x4a2/0x580 mm/swap.c:112 __folio_split+0xffe/0x1570 mm/huge_memory.c:4199 try_folio_split_to_order include/linux/huge_mm.h:411 [inline] try_folio_split_or_unmap+0x5b/0x1e0 mm/truncate.c:189 truncate_inode_partial_folio+0x4ab/0x8e0 mm/truncate.c:255 shmem_undo_range+0x9a2/0x1660 mm/shmem.c:1181 shmem_truncate_range mm/shmem.c:1277 [inline] shmem_fallocate+0x51c/0xec0 mm/shmem.c:3703 vfs_fallocate+0x669/0x7e0 fs/open.c:338 madvise_remove mm/madvise.c:1039 [inline] madvise_vma_behavior+0x2bc8/0x4300 mm/madvise.c:1352 madvise_walk_vmas+0x573/0xae0 mm/madvise.c:1713 madvise_do_behavior+0x386/0x540 mm/madvise.c:1929 do_madvise+0x1fa/0x2e0 mm/madvise.c:2022 __do_sys_madvise mm/madvise.c:2031 [inline] __se_sys_madvise mm/madvise.c:2029 [inline] __x64_sys_madvise+0xa6/0xc0 mm/madvise.c:2029 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fc37db9ce59 Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fc37ead0028 EFLAGS: 00000246 ORIG_RAX: 000000000000001c RAX: ffffffffffffffda RBX: 00007fc37de15fa0 RCX: 00007fc37db9ce59 RDX: 0000000000000009 RSI: 0000000000600003 RDI: 0000200000000000 RBP: 00007fc37dc32d6f R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fc37de16038 R14: 00007fc37de15fa0 R15: 00007fff07d58848 </TASK> *** If these findings have caused you to resend the series or submit a separate fix, please add the following tag to your commit message: Tested-by: syzbot@syzkaller.appspotmail.com --- This report is generated by a bot. It may contain errors. syzbot ci engineers can be reached at syzkaller@googlegroups.com. To test a patch for this bug, please reply with `#syz test` (should be on a separate line). The patch should be attached to the email. Note: arguments like custom git repos and branches are not supported. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH] mm/huge_memory: do not add dropped split tail folios to LRU 2026-06-10 12:05 [RFC PATCH] mm/huge_memory: do not add dropped split tail folios to LRU zhaoyang.huang ` (2 preceding siblings ...) 2026-06-11 7:33 ` [syzbot ci] " syzbot ci @ 2026-06-11 9:30 ` Lorenzo Stoakes 3 siblings, 0 replies; 16+ messages in thread From: Lorenzo Stoakes @ 2026-06-11 9:30 UTC (permalink / raw) To: zhaoyang.huang Cc: Andrew Morton, David Hildenbrand, Zi Yan, Barry Song, Liam R. Howlett, Baolin Wang, Lance Yang, Nico Pache, Ryan Roberts, Dev Jain, linux-mm, linux-kernel, Zhaoyang Huang, steve.kang -cc incorrect email addresses +cc correct ones $ scripts/get_maintainer.pl --no-git mm/huge_memory.c Andrew Morton <akpm@linux-foundation.org> (maintainer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE)) David Hildenbrand <david@kernel.org> (maintainer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE)) Lorenzo Stoakes <ljs@kernel.org> (maintainer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE)) ^ |----- Please use the correct email address. Zi Yan <ziy@nvidia.com> (reviewer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE)) Baolin Wang <baolin.wang@linux.alibaba.com> (reviewer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE)) "Liam R. Howlett" <liam@infradead.org> (reviewer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE)) ^ |--- Please use the correct email address. Nico Pache <npache@redhat.com> (reviewer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE)) Ryan Roberts <ryan.roberts@arm.com> (reviewer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE)) Dev Jain <dev.jain@arm.com> (reviewer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE)) Barry Song <baohua@kernel.org> (reviewer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE)) Lance Yang <lance.yang@linux.dev> (reviewer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE)) linux-mm@kvack.org (open list:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE)) linux-kernel@vger.kernel.org (open list) Thanks, Lorenzo ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2026-06-11 9:30 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-10 12:05 [RFC PATCH] mm/huge_memory: do not add dropped split tail folios to LRU zhaoyang.huang 2026-06-10 12:50 ` David Hildenbrand (Arm) 2026-06-10 14:38 ` Zi Yan 2026-06-10 17:25 ` Zi Yan 2026-06-10 18:44 ` Zi Yan 2026-06-11 1:19 ` Zhaoyang Huang 2026-06-11 1:49 ` Zi Yan 2026-06-11 1:39 ` Zhaoyang Huang 2026-06-11 1:56 ` Zi Yan 2026-06-11 2:39 ` Zhaoyang Huang 2026-06-11 3:06 ` Zi Yan 2026-06-11 7:45 ` Zhaoyang Huang 2026-06-10 20:30 ` Andrew Morton 2026-06-10 20:36 ` Zi Yan 2026-06-11 7:33 ` [syzbot ci] " syzbot ci 2026-06-11 9:30 ` [RFC PATCH] " Lorenzo Stoakes
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox