* [PATCH mm-unstable 1/1] mm: fix deferred split queue races during migration
@ 2026-04-01 13:10 Lance Yang
2026-04-01 16:28 ` Usama Arif
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Lance Yang @ 2026-04-01 13:10 UTC (permalink / raw)
To: akpm
Cc: david, ljs, ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts,
dev.jain, baohua, matthew.brost, joshua.hahnjy, rakie.kim,
byungchul, gourry, ying.huang, apopple, richard.weiyang,
usama.arif, linux-mm, linux-kernel, kartikey406,
syzbot+a7067a757858ac8eb085, stable, Lance Yang
From: Lance Yang <lance.yang@linux.dev>
migrate_folio_move() records the deferred split queue state from src and
replays it on dst. Replaying it after remove_migration_ptes(src, dst, 0)
makes dst visible before it is requeued, so a concurrent rmap-removal path
can mark dst partially mapped and trip the WARN in deferred_split_folio().
Move the requeue before remove_migration_ptes() so dst is back on the
deferred split queue before it becomes visible again.
Because migration still holds dst locked at that point, teach
deferred_split_scan() to requeue a folio when folio_trylock() fails.
Otherwise a fully mapped underused folio can be dequeued by the shrinker
and silently lost from split_queue.
Link: https://syzkaller.appspot.com/bug?extid=a7067a757858ac8eb085
Fixes: 8a8ca142a488 ("mm: migrate: requeue destination folio on deferred split queue")
Reported-by: syzbot+a7067a757858ac8eb085@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-mm/69ccb65b.050a0220.183828.003a.GAE@google.com/
Cc: <stable@vger.kernel.org>
Suggested-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Lance Yang <lance.yang@linux.dev>
---
[ Backport note ]
This patch is a follow-up fix for 8a8ca142a488 ("mm: migrate: requeue
destination folio on deferred split queue"), which is currently only in
mm-stable, and should be backported together with it.
Credit for this fix goes to David, thanks!
mm/huge_memory.c | 12 +++++++-----
mm/migrate.c | 18 +++++++++---------
2 files changed, 16 insertions(+), 14 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index ff9a42abd1b6..ac6d823e351f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -4558,7 +4558,7 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
goto next;
}
if (!folio_trylock(folio))
- goto next;
+ goto requeue;
if (!split_folio(folio)) {
did_split = true;
if (underused)
@@ -4569,11 +4569,13 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
next:
if (did_split || !folio_test_partially_mapped(folio))
continue;
+requeue:
/*
- * Only add back to the queue if folio is partially mapped.
- * If thp_underused returns false, or if split_folio fails
- * in the case it was underused, then consider it used and
- * don't add it back to split_queue.
+ * Add back partially mapped folios, or underused folios
+ * that we could not lock this round. If thp_underused()
+ * returns false, or if split_folio() succeeds, or if
+ * split_folio() fails in the case it was underused, then
+ * consider it used and don't add it back to split_queue.
*/
fqueue = folio_split_queue_lock_irqsave(folio, &flags);
if (list_empty(&folio->_deferred_list)) {
diff --git a/mm/migrate.c b/mm/migrate.c
index 05cb408846f2..8a64291ab5b4 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1385,6 +1385,15 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
if (rc)
goto out;
+ /*
+ * Requeue the destination folio on the deferred split queue if
+ * the source was on the queue. The source is unqueued in
+ * __folio_migrate_mapping(), so we recorded the state from
+ * before move_to_new_folio().
+ */
+ if (src_deferred_split)
+ deferred_split_folio(dst, src_partially_mapped);
+
/*
* When successful, push dst to LRU immediately: so that if it
* turns out to be an mlocked page, remove_migration_ptes() will
@@ -1401,15 +1410,6 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
if (old_page_state & PAGE_WAS_MAPPED)
remove_migration_ptes(src, dst, 0);
- /*
- * Requeue the destination folio on the deferred split queue if
- * the source was on the queue. The source is unqueued in
- * __folio_migrate_mapping(), so we recorded the state from
- * before move_to_new_folio().
- */
- if (src_deferred_split)
- deferred_split_folio(dst, src_partially_mapped);
-
out_unlock_both:
folio_unlock(dst);
folio_set_owner_migrate_reason(dst, reason);
--
2.49.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH mm-unstable 1/1] mm: fix deferred split queue races during migration
2026-04-01 13:10 [PATCH mm-unstable 1/1] mm: fix deferred split queue races during migration Lance Yang
@ 2026-04-01 16:28 ` Usama Arif
2026-04-01 18:50 ` David Hildenbrand (Arm)
2026-04-01 18:51 ` David Hildenbrand (Arm)
` (2 subsequent siblings)
3 siblings, 1 reply; 9+ messages in thread
From: Usama Arif @ 2026-04-01 16:28 UTC (permalink / raw)
To: Lance Yang
Cc: Usama Arif, akpm, david, ljs, ziy, baolin.wang, Liam.Howlett,
npache, ryan.roberts, dev.jain, baohua, matthew.brost,
joshua.hahnjy, rakie.kim, byungchul, gourry, ying.huang, apopple,
richard.weiyang, linux-mm, linux-kernel, kartikey406,
syzbot+a7067a757858ac8eb085, stable
On Wed, 1 Apr 2026 21:10:32 +0800 Lance Yang <lance.yang@linux.dev> wrote:
> From: Lance Yang <lance.yang@linux.dev>
>
> migrate_folio_move() records the deferred split queue state from src and
> replays it on dst. Replaying it after remove_migration_ptes(src, dst, 0)
> makes dst visible before it is requeued, so a concurrent rmap-removal path
> can mark dst partially mapped and trip the WARN in deferred_split_folio().
>
> Move the requeue before remove_migration_ptes() so dst is back on the
> deferred split queue before it becomes visible again.
>
> Because migration still holds dst locked at that point, teach
> deferred_split_scan() to requeue a folio when folio_trylock() fails.
> Otherwise a fully mapped underused folio can be dequeued by the shrinker
> and silently lost from split_queue.
>
> Link: https://syzkaller.appspot.com/bug?extid=a7067a757858ac8eb085
> Fixes: 8a8ca142a488 ("mm: migrate: requeue destination folio on deferred split queue")
> Reported-by: syzbot+a7067a757858ac8eb085@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/linux-mm/69ccb65b.050a0220.183828.003a.GAE@google.com/
> Cc: <stable@vger.kernel.org>
> Suggested-by: David Hildenbrand (Arm) <david@kernel.org>
> Signed-off-by: Lance Yang <lance.yang@linux.dev>
> ---
>
> [ Backport note ]
> This patch is a follow-up fix for 8a8ca142a488 ("mm: migrate: requeue
> destination folio on deferred split queue"), which is currently only in
> mm-stable, and should be backported together with it.
>
> Credit for this fix goes to David, thanks!
>
> mm/huge_memory.c | 12 +++++++-----
> mm/migrate.c | 18 +++++++++---------
> 2 files changed, 16 insertions(+), 14 deletions(-)
>
Thanks for the fix! And sorry for introducing the bug in
migrate_folio_move() :)
So I am happy with the migrate_folio_move() change, it makes sense.
The goto next if folio is locked in deferred_split_scan() was actually
on purpose. The reasoning was that if the folio is locked, we consider
it as in use by someone and therefore we shouldnt split it. Eventhough
thp_underused() does a zero-filled check, the whole point of the shrinker
was to split THPs that are "not in use", and in my mind, locked folio
is a folio in use. So not sure about that change..
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index ff9a42abd1b6..ac6d823e351f 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -4558,7 +4558,7 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
> goto next;
> }
> if (!folio_trylock(folio))
> - goto next;
> + goto requeue;
> if (!split_folio(folio)) {
> did_split = true;
> if (underused)
> @@ -4569,11 +4569,13 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
> next:
> if (did_split || !folio_test_partially_mapped(folio))
> continue;
> +requeue:
> /*
> - * Only add back to the queue if folio is partially mapped.
> - * If thp_underused returns false, or if split_folio fails
> - * in the case it was underused, then consider it used and
> - * don't add it back to split_queue.
> + * Add back partially mapped folios, or underused folios
> + * that we could not lock this round. If thp_underused()
> + * returns false, or if split_folio() succeeds, or if
> + * split_folio() fails in the case it was underused, then
> + * consider it used and don't add it back to split_queue.
> */
> fqueue = folio_split_queue_lock_irqsave(folio, &flags);
> if (list_empty(&folio->_deferred_list)) {
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 05cb408846f2..8a64291ab5b4 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1385,6 +1385,15 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
> if (rc)
> goto out;
>
> + /*
> + * Requeue the destination folio on the deferred split queue if
> + * the source was on the queue. The source is unqueued in
> + * __folio_migrate_mapping(), so we recorded the state from
> + * before move_to_new_folio().
> + */
> + if (src_deferred_split)
> + deferred_split_folio(dst, src_partially_mapped);
> +
> /*
> * When successful, push dst to LRU immediately: so that if it
> * turns out to be an mlocked page, remove_migration_ptes() will
> @@ -1401,15 +1410,6 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
> if (old_page_state & PAGE_WAS_MAPPED)
> remove_migration_ptes(src, dst, 0);
>
> - /*
> - * Requeue the destination folio on the deferred split queue if
> - * the source was on the queue. The source is unqueued in
> - * __folio_migrate_mapping(), so we recorded the state from
> - * before move_to_new_folio().
> - */
> - if (src_deferred_split)
> - deferred_split_folio(dst, src_partially_mapped);
> -
> out_unlock_both:
> folio_unlock(dst);
> folio_set_owner_migrate_reason(dst, reason);
> --
> 2.49.0
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH mm-unstable 1/1] mm: fix deferred split queue races during migration
2026-04-01 16:28 ` Usama Arif
@ 2026-04-01 18:50 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 9+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-01 18:50 UTC (permalink / raw)
To: Usama Arif, Lance Yang
Cc: akpm, ljs, ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts,
dev.jain, baohua, matthew.brost, joshua.hahnjy, rakie.kim,
byungchul, gourry, ying.huang, apopple, richard.weiyang, linux-mm,
linux-kernel, kartikey406, syzbot+a7067a757858ac8eb085, stable
On 4/1/26 18:28, Usama Arif wrote:
> On Wed, 1 Apr 2026 21:10:32 +0800 Lance Yang <lance.yang@linux.dev> wrote:
>
>> From: Lance Yang <lance.yang@linux.dev>
>>
>> migrate_folio_move() records the deferred split queue state from src and
>> replays it on dst. Replaying it after remove_migration_ptes(src, dst, 0)
>> makes dst visible before it is requeued, so a concurrent rmap-removal path
>> can mark dst partially mapped and trip the WARN in deferred_split_folio().
>>
>> Move the requeue before remove_migration_ptes() so dst is back on the
>> deferred split queue before it becomes visible again.
>>
>> Because migration still holds dst locked at that point, teach
>> deferred_split_scan() to requeue a folio when folio_trylock() fails.
>> Otherwise a fully mapped underused folio can be dequeued by the shrinker
>> and silently lost from split_queue.
>>
>> Link: https://syzkaller.appspot.com/bug?extid=a7067a757858ac8eb085
>> Fixes: 8a8ca142a488 ("mm: migrate: requeue destination folio on deferred split queue")
>> Reported-by: syzbot+a7067a757858ac8eb085@syzkaller.appspotmail.com
>> Closes: https://lore.kernel.org/linux-mm/69ccb65b.050a0220.183828.003a.GAE@google.com/
>> Cc: <stable@vger.kernel.org>
>> Suggested-by: David Hildenbrand (Arm) <david@kernel.org>
>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>> ---
>>
>> [ Backport note ]
>> This patch is a follow-up fix for 8a8ca142a488 ("mm: migrate: requeue
>> destination folio on deferred split queue"), which is currently only in
>> mm-stable, and should be backported together with it.
>>
>> Credit for this fix goes to David, thanks!
>>
>> mm/huge_memory.c | 12 +++++++-----
>> mm/migrate.c | 18 +++++++++---------
>> 2 files changed, 16 insertions(+), 14 deletions(-)
>>
>
>
> Thanks for the fix! And sorry for introducing the bug in
> migrate_folio_move() :)
>
> So I am happy with the migrate_folio_move() change, it makes sense.
>
> The goto next if folio is locked in deferred_split_scan() was actually
> on purpose. The reasoning was that if the folio is locked, we consider
> it as in use by someone and therefore we shouldnt split it. Eventhough
> thp_underused() does a zero-filled check, the whole point of the shrinker
> was to split THPs that are "not in use", and in my mind, locked folio
> is a folio in use. So not sure about that change..
That is a questionable assessment. It's about checking whether folios
are *underused* not, if they are used, by whoever in the system (e.g.,
migration).
Just take a look when anonymous folios are actually locked :)
So the original locked handling here is just bogus.
--
Cheers,
David
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH mm-unstable 1/1] mm: fix deferred split queue races during migration
2026-04-01 13:10 [PATCH mm-unstable 1/1] mm: fix deferred split queue races during migration Lance Yang
2026-04-01 16:28 ` Usama Arif
@ 2026-04-01 18:51 ` David Hildenbrand (Arm)
2026-04-01 19:21 ` Zi Yan
2026-04-01 21:48 ` Andrew Morton
3 siblings, 0 replies; 9+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-01 18:51 UTC (permalink / raw)
To: Lance Yang, akpm
Cc: ljs, ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts,
dev.jain, baohua, matthew.brost, joshua.hahnjy, rakie.kim,
byungchul, gourry, ying.huang, apopple, richard.weiyang,
usama.arif, linux-mm, linux-kernel, kartikey406,
syzbot+a7067a757858ac8eb085, stable
On 4/1/26 15:10, Lance Yang wrote:
> From: Lance Yang <lance.yang@linux.dev>
>
> migrate_folio_move() records the deferred split queue state from src and
> replays it on dst. Replaying it after remove_migration_ptes(src, dst, 0)
> makes dst visible before it is requeued, so a concurrent rmap-removal path
> can mark dst partially mapped and trip the WARN in deferred_split_folio().
>
> Move the requeue before remove_migration_ptes() so dst is back on the
> deferred split queue before it becomes visible again.
>
> Because migration still holds dst locked at that point, teach
> deferred_split_scan() to requeue a folio when folio_trylock() fails.
> Otherwise a fully mapped underused folio can be dequeued by the shrinker
> and silently lost from split_queue.
>
> Link: https://syzkaller.appspot.com/bug?extid=a7067a757858ac8eb085
> Fixes: 8a8ca142a488 ("mm: migrate: requeue destination folio on deferred split queue")
> Reported-by: syzbot+a7067a757858ac8eb085@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/linux-mm/69ccb65b.050a0220.183828.003a.GAE@google.com/
> Cc: <stable@vger.kernel.org>
> Suggested-by: David Hildenbrand (Arm) <david@kernel.org>
> Signed-off-by: Lance Yang <lance.yang@linux.dev>
> ---
LGTM, thanks!
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH mm-unstable 1/1] mm: fix deferred split queue races during migration
2026-04-01 13:10 [PATCH mm-unstable 1/1] mm: fix deferred split queue races during migration Lance Yang
2026-04-01 16:28 ` Usama Arif
2026-04-01 18:51 ` David Hildenbrand (Arm)
@ 2026-04-01 19:21 ` Zi Yan
2026-04-01 22:55 ` Zi Yan
2026-04-01 21:48 ` Andrew Morton
3 siblings, 1 reply; 9+ messages in thread
From: Zi Yan @ 2026-04-01 19:21 UTC (permalink / raw)
To: Lance Yang
Cc: akpm, david, ljs, baolin.wang, Liam.Howlett, npache, ryan.roberts,
dev.jain, baohua, matthew.brost, joshua.hahnjy, rakie.kim,
byungchul, gourry, ying.huang, apopple, richard.weiyang,
usama.arif, linux-mm, linux-kernel, kartikey406,
syzbot+a7067a757858ac8eb085, stable
On 1 Apr 2026, at 9:10, Lance Yang wrote:
> From: Lance Yang <lance.yang@linux.dev>
>
> migrate_folio_move() records the deferred split queue state from src and
> replays it on dst. Replaying it after remove_migration_ptes(src, dst, 0)
> makes dst visible before it is requeued, so a concurrent rmap-removal path
> can mark dst partially mapped and trip the WARN in deferred_split_folio().
>
> Move the requeue before remove_migration_ptes() so dst is back on the
> deferred split queue before it becomes visible again.
>
> Because migration still holds dst locked at that point, teach
> deferred_split_scan() to requeue a folio when folio_trylock() fails.
> Otherwise a fully mapped underused folio can be dequeued by the shrinker
> and silently lost from split_queue.
>
> Link: https://syzkaller.appspot.com/bug?extid=a7067a757858ac8eb085
> Fixes: 8a8ca142a488 ("mm: migrate: requeue destination folio on deferred split queue")
> Reported-by: syzbot+a7067a757858ac8eb085@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/linux-mm/69ccb65b.050a0220.183828.003a.GAE@google.com/
> Cc: <stable@vger.kernel.org>
> Suggested-by: David Hildenbrand (Arm) <david@kernel.org>
> Signed-off-by: Lance Yang <lance.yang@linux.dev>
> ---
>
> [ Backport note ]
> This patch is a follow-up fix for 8a8ca142a488 ("mm: migrate: requeue
> destination folio on deferred split queue"), which is currently only in
> mm-stable, and should be backported together with it.
>
> Credit for this fix goes to David, thanks!
>
> mm/huge_memory.c | 12 +++++++-----
> mm/migrate.c | 18 +++++++++---------
> 2 files changed, 16 insertions(+), 14 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index ff9a42abd1b6..ac6d823e351f 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -4558,7 +4558,7 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
> goto next;
> }
> if (!folio_trylock(folio))
> - goto next;
> + goto requeue;
> if (!split_folio(folio)) {
> did_split = true;
> if (underused)
> @@ -4569,11 +4569,13 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
> next:
> if (did_split || !folio_test_partially_mapped(folio))
> continue;
> +requeue:
> /*
> - * Only add back to the queue if folio is partially mapped.
> - * If thp_underused returns false, or if split_folio fails
> - * in the case it was underused, then consider it used and
> - * don't add it back to split_queue.
> + * Add back partially mapped folios, or underused folios
> + * that we could not lock this round. If thp_underused()
> + * returns false, or if split_folio() succeeds, or if
> + * split_folio() fails in the case it was underused, then
> + * consider it used and don't add it back to split_queue.
> */
Should the sentence
“If thp_underused() returns false, or if split_folio() succeeds, or if
split_folio() fails in the case it was underused, then
consider it used and don't add it back to split_queue.”
be moved to below label next?
Since “thp_underused() returns false” is describing “if (!underused) goto next”,
“split_folio() succeeds” is describing “did_split == true in the if”,
“split_folio() fails in the case it was underused” is describing
“did_split == false and !folio_test_partially_mapped(folio) in the if”.
The first sentence matches the goto requeue for folio_trylock().
Otherwise, LGTM.
Acked-by: Zi Yan <ziy@nvidia.com>
> fqueue = folio_split_queue_lock_irqsave(folio, &flags);
> if (list_empty(&folio->_deferred_list)) {
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 05cb408846f2..8a64291ab5b4 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1385,6 +1385,15 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
> if (rc)
> goto out;
>
> + /*
> + * Requeue the destination folio on the deferred split queue if
> + * the source was on the queue. The source is unqueued in
> + * __folio_migrate_mapping(), so we recorded the state from
> + * before move_to_new_folio().
> + */
> + if (src_deferred_split)
> + deferred_split_folio(dst, src_partially_mapped);
> +
> /*
> * When successful, push dst to LRU immediately: so that if it
> * turns out to be an mlocked page, remove_migration_ptes() will
> @@ -1401,15 +1410,6 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
> if (old_page_state & PAGE_WAS_MAPPED)
> remove_migration_ptes(src, dst, 0);
>
> - /*
> - * Requeue the destination folio on the deferred split queue if
> - * the source was on the queue. The source is unqueued in
> - * __folio_migrate_mapping(), so we recorded the state from
> - * before move_to_new_folio().
> - */
> - if (src_deferred_split)
> - deferred_split_folio(dst, src_partially_mapped);
> -
> out_unlock_both:
> folio_unlock(dst);
> folio_set_owner_migrate_reason(dst, reason);
> --
> 2.49.0
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH mm-unstable 1/1] mm: fix deferred split queue races during migration
2026-04-01 13:10 [PATCH mm-unstable 1/1] mm: fix deferred split queue races during migration Lance Yang
` (2 preceding siblings ...)
2026-04-01 19:21 ` Zi Yan
@ 2026-04-01 21:48 ` Andrew Morton
3 siblings, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2026-04-01 21:48 UTC (permalink / raw)
To: Lance Yang
Cc: david, ljs, ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts,
dev.jain, baohua, matthew.brost, joshua.hahnjy, rakie.kim,
byungchul, gourry, ying.huang, apopple, richard.weiyang,
usama.arif, linux-mm, linux-kernel, kartikey406,
syzbot+a7067a757858ac8eb085, stable
On Wed, 1 Apr 2026 21:10:32 +0800 Lance Yang <lance.yang@linux.dev> wrote:
> From: Lance Yang <lance.yang@linux.dev>
>
> migrate_folio_move() records the deferred split queue state from src and
> replays it on dst. Replaying it after remove_migration_ptes(src, dst, 0)
> makes dst visible before it is requeued, so a concurrent rmap-removal path
> can mark dst partially mapped and trip the WARN in deferred_split_folio().
>
> Move the requeue before remove_migration_ptes() so dst is back on the
> deferred split queue before it becomes visible again.
>
> Because migration still holds dst locked at that point, teach
> deferred_split_scan() to requeue a folio when folio_trylock() fails.
> Otherwise a fully mapped underused folio can be dequeued by the shrinker
> and silently lost from split_queue.
Thanks.
> Link: https://syzkaller.appspot.com/bug?extid=a7067a757858ac8eb085
> Fixes: 8a8ca142a488 ("mm: migrate: requeue destination folio on deferred split queue")
> Reported-by: syzbot+a7067a757858ac8eb085@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/linux-mm/69ccb65b.050a0220.183828.003a.GAE@google.com/
> Cc: <stable@vger.kernel.org>
> Suggested-by: David Hildenbrand (Arm) <david@kernel.org>
> Signed-off-by: Lance Yang <lance.yang@linux.dev>
I'll add this to mm-unstable with a plan to move it into the current
mm-stable batch in a few days. So that 8a8ca142a488 and this
follow-up fix stay in the same bundle.
> [ Backport note ]
> This patch is a follow-up fix for 8a8ca142a488 ("mm: migrate: requeue
> destination folio on deferred split queue"), which is currently only in
> mm-stable, and should be backported together with it.
As far as I understand it, this should happen automatically.
8a8ca142a488 has cc:stable, this patch has Fixes:8a8ca142a488 and
also cc:stable.
There's enough info here for the -stable people to figure it out!
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH mm-unstable 1/1] mm: fix deferred split queue races during migration
2026-04-01 19:21 ` Zi Yan
@ 2026-04-01 22:55 ` Zi Yan
2026-04-01 23:19 ` Andrew Morton
0 siblings, 1 reply; 9+ messages in thread
From: Zi Yan @ 2026-04-01 22:55 UTC (permalink / raw)
To: akpm, Lance Yang
Cc: david, ljs, baolin.wang, Liam.Howlett, npache, ryan.roberts,
dev.jain, baohua, matthew.brost, joshua.hahnjy, rakie.kim,
byungchul, gourry, ying.huang, apopple, richard.weiyang,
usama.arif, linux-mm, linux-kernel, kartikey406,
syzbot+a7067a757858ac8eb085, stable
On 1 Apr 2026, at 15:21, Zi Yan wrote:
> On 1 Apr 2026, at 9:10, Lance Yang wrote:
>
>> From: Lance Yang <lance.yang@linux.dev>
>>
>> migrate_folio_move() records the deferred split queue state from src and
>> replays it on dst. Replaying it after remove_migration_ptes(src, dst, 0)
>> makes dst visible before it is requeued, so a concurrent rmap-removal path
>> can mark dst partially mapped and trip the WARN in deferred_split_folio().
>>
>> Move the requeue before remove_migration_ptes() so dst is back on the
>> deferred split queue before it becomes visible again.
>>
>> Because migration still holds dst locked at that point, teach
>> deferred_split_scan() to requeue a folio when folio_trylock() fails.
>> Otherwise a fully mapped underused folio can be dequeued by the shrinker
>> and silently lost from split_queue.
>>
>> Link: https://syzkaller.appspot.com/bug?extid=a7067a757858ac8eb085
>> Fixes: 8a8ca142a488 ("mm: migrate: requeue destination folio on deferred split queue")
>> Reported-by: syzbot+a7067a757858ac8eb085@syzkaller.appspotmail.com
>> Closes: https://lore.kernel.org/linux-mm/69ccb65b.050a0220.183828.003a.GAE@google.com/
>> Cc: <stable@vger.kernel.org>
>> Suggested-by: David Hildenbrand (Arm) <david@kernel.org>
>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>> ---
>>
>> [ Backport note ]
>> This patch is a follow-up fix for 8a8ca142a488 ("mm: migrate: requeue
>> destination folio on deferred split queue"), which is currently only in
>> mm-stable, and should be backported together with it.
>>
>> Credit for this fix goes to David, thanks!
>>
>> mm/huge_memory.c | 12 +++++++-----
>> mm/migrate.c | 18 +++++++++---------
>> 2 files changed, 16 insertions(+), 14 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index ff9a42abd1b6..ac6d823e351f 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -4558,7 +4558,7 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
>> goto next;
>> }
>> if (!folio_trylock(folio))
>> - goto next;
>> + goto requeue;
>> if (!split_folio(folio)) {
>> did_split = true;
>> if (underused)
>> @@ -4569,11 +4569,13 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
>> next:
>> if (did_split || !folio_test_partially_mapped(folio))
>> continue;
>> +requeue:
>> /*
>> - * Only add back to the queue if folio is partially mapped.
>> - * If thp_underused returns false, or if split_folio fails
>> - * in the case it was underused, then consider it used and
>> - * don't add it back to split_queue.
>> + * Add back partially mapped folios, or underused folios
>> + * that we could not lock this round. If thp_underused()
>> + * returns false, or if split_folio() succeeds, or if
>> + * split_folio() fails in the case it was underused, then
>> + * consider it used and don't add it back to split_queue.
>> */
>
> Should the sentence
> “If thp_underused() returns false, or if split_folio() succeeds, or if
> split_folio() fails in the case it was underused, then
> consider it used and don't add it back to split_queue.”
> be moved to below label next?
>
> Since “thp_underused() returns false” is describing “if (!underused) goto next”,
> “split_folio() succeeds” is describing “did_split == true in the if”,
> “split_folio() fails in the case it was underused” is describing
> “did_split == false and !folio_test_partially_mapped(folio) in the if”.
>
> The first sentence matches the goto requeue for folio_trylock().
Hi Andrew,
Can you apply the fixup below to move the comment? Lance told me he
would be away for a while, so he could not send a fixup to move
the comment.
Thanks.
From 6ebeca9f7215cb91905d3f49385dbbafce5a80c2 Mon Sep 17 00:00:00 2001
From: Zi Yan <ziy@nvidia.com>
Date: Wed, 1 Apr 2026 18:52:43 -0400
Subject: [PATCH] move the comment.
Signed-off-by: Zi Yan <ziy@nvidia.com>
---
mm/huge_memory.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index ac6d823e351ff..970e077019b75 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -4567,15 +4567,18 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
}
folio_unlock(folio);
next:
+ /*
+ * If thp_underused() returns false, or if split_folio()
+ * succeeds, or if split_folio() fails in the case it was
+ * underused, then consider it used and don't add it back to
+ * split_queue.
+ */
if (did_split || !folio_test_partially_mapped(folio))
continue;
requeue:
/*
- * Add back partially mapped folios, or underused folios
- * that we could not lock this round. If thp_underused()
- * returns false, or if split_folio() succeeds, or if
- * split_folio() fails in the case it was underused, then
- * consider it used and don't add it back to split_queue.
+ * Add back partially mapped folios, or underused folios that
+ * we could not lock this round.
*/
fqueue = folio_split_queue_lock_irqsave(folio, &flags);
if (list_empty(&folio->_deferred_list)) {
--
2.53.0
>
> Otherwise, LGTM.
>
> Acked-by: Zi Yan <ziy@nvidia.com>
>
>> fqueue = folio_split_queue_lock_irqsave(folio, &flags);
>> if (list_empty(&folio->_deferred_list)) {
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 05cb408846f2..8a64291ab5b4 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1385,6 +1385,15 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>> if (rc)
>> goto out;
>>
>> + /*
>> + * Requeue the destination folio on the deferred split queue if
>> + * the source was on the queue. The source is unqueued in
>> + * __folio_migrate_mapping(), so we recorded the state from
>> + * before move_to_new_folio().
>> + */
>> + if (src_deferred_split)
>> + deferred_split_folio(dst, src_partially_mapped);
>> +
>> /*
>> * When successful, push dst to LRU immediately: so that if it
>> * turns out to be an mlocked page, remove_migration_ptes() will
>> @@ -1401,15 +1410,6 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>> if (old_page_state & PAGE_WAS_MAPPED)
>> remove_migration_ptes(src, dst, 0);
>>
>> - /*
>> - * Requeue the destination folio on the deferred split queue if
>> - * the source was on the queue. The source is unqueued in
>> - * __folio_migrate_mapping(), so we recorded the state from
>> - * before move_to_new_folio().
>> - */
>> - if (src_deferred_split)
>> - deferred_split_folio(dst, src_partially_mapped);
>> -
>> out_unlock_both:
>> folio_unlock(dst);
>> folio_set_owner_migrate_reason(dst, reason);
>> --
>> 2.49.0
>
>
> Best Regards,
> Yan, Zi
Best Regards,
Yan, Zi
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH mm-unstable 1/1] mm: fix deferred split queue races during migration
2026-04-01 22:55 ` Zi Yan
@ 2026-04-01 23:19 ` Andrew Morton
2026-04-03 4:24 ` Lance Yang
0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2026-04-01 23:19 UTC (permalink / raw)
To: Zi Yan
Cc: Lance Yang, david, ljs, baolin.wang, Liam.Howlett, npache,
ryan.roberts, dev.jain, baohua, matthew.brost, joshua.hahnjy,
rakie.kim, byungchul, gourry, ying.huang, apopple,
richard.weiyang, usama.arif, linux-mm, linux-kernel, kartikey406,
syzbot+a7067a757858ac8eb085, stable
On Wed, 01 Apr 2026 18:55:48 -0400 Zi Yan <ziy@nvidia.com> wrote:
> Can you apply the fixup below to move the comment? Lance told me he
> would be away for a while, so he could not send a fixup to move
> the comment.
Thanks. I folded that into Lance's base patch so here's the whole
thing:
From: Lance Yang <lance.yang@linux.dev>
Subject: mm: fix deferred split queue races during migration
Date: Wed, 1 Apr 2026 21:10:32 +0800
migrate_folio_move() records the deferred split queue state from src and
replays it on dst. Replaying it after remove_migration_ptes(src, dst, 0)
makes dst visible before it is requeued, so a concurrent rmap-removal path
can mark dst partially mapped and trip the WARN in deferred_split_folio().
Move the requeue before remove_migration_ptes() so dst is back on the
deferred split queue before it becomes visible again.
Because migration still holds dst locked at that point, teach
deferred_split_scan() to requeue a folio when folio_trylock() fails.
Otherwise a fully mapped underused folio can be dequeued by the shrinker
and silently lost from split_queue.
[ziy@nvidia.com: move the comment]
Link: https://lkml.kernel.org/r/FB71A764-0F10-4E5A-B4A0-BA4C7F138408@nvidia.com
Link: https://syzkaller.appspot.com/bug?extid=a7067a757858ac8eb085
Link: https://lkml.kernel.org/r/20260401131032.13011-1-lance.yang@linux.dev
Fixes: 8a8ca142a488 ("mm: migrate: requeue destination folio on deferred split queue")
Signed-off-by: Lance Yang <lance.yang@linux.dev>
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reported-by: syzbot+a7067a757858ac8eb085@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-mm/69ccb65b.050a0220.183828.003a.GAE@google.com/
Suggested-by: David Hildenbrand (Arm) <david@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Acked-by: Zi Yan <ziy@nvidia.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Byungchul Park <byungchul@sk.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Deepanshu Kartikey <kartikey406@gmail.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Usama Arif <usama.arif@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/huge_memory.c | 15 ++++++++++-----
mm/migrate.c | 18 +++++++++---------
2 files changed, 19 insertions(+), 14 deletions(-)
--- a/mm/huge_memory.c~mm-fix-deferred-split-queue-races-during-migration
+++ a/mm/huge_memory.c
@@ -4542,7 +4542,7 @@ retry:
goto next;
}
if (!folio_trylock(folio))
- goto next;
+ goto requeue;
if (!split_folio(folio)) {
did_split = true;
if (underused)
@@ -4551,13 +4551,18 @@ retry:
}
folio_unlock(folio);
next:
+ /*
+ * If thp_underused() returns false, or if split_folio()
+ * succeeds, or if split_folio() fails in the case it was
+ * underused, then consider it used and don't add it back to
+ * split_queue.
+ */
if (did_split || !folio_test_partially_mapped(folio))
continue;
+requeue:
/*
- * Only add back to the queue if folio is partially mapped.
- * If thp_underused returns false, or if split_folio fails
- * in the case it was underused, then consider it used and
- * don't add it back to split_queue.
+ * Add back partially mapped folios, or underused folios that
+ * we could not lock this round.
*/
fqueue = folio_split_queue_lock_irqsave(folio, &flags);
if (list_empty(&folio->_deferred_list)) {
--- a/mm/migrate.c~mm-fix-deferred-split-queue-races-during-migration
+++ a/mm/migrate.c
@@ -1384,6 +1384,15 @@ static int migrate_folio_move(free_folio
goto out;
/*
+ * Requeue the destination folio on the deferred split queue if
+ * the source was on the queue. The source is unqueued in
+ * __folio_migrate_mapping(), so we recorded the state from
+ * before move_to_new_folio().
+ */
+ if (src_deferred_split)
+ deferred_split_folio(dst, src_partially_mapped);
+
+ /*
* When successful, push dst to LRU immediately: so that if it
* turns out to be an mlocked page, remove_migration_ptes() will
* automatically build up the correct dst->mlock_count for it.
@@ -1399,15 +1408,6 @@ static int migrate_folio_move(free_folio
if (old_page_state & PAGE_WAS_MAPPED)
remove_migration_ptes(src, dst, 0);
- /*
- * Requeue the destination folio on the deferred split queue if
- * the source was on the queue. The source is unqueued in
- * __folio_migrate_mapping(), so we recorded the state from
- * before move_to_new_folio().
- */
- if (src_deferred_split)
- deferred_split_folio(dst, src_partially_mapped);
-
out_unlock_both:
folio_unlock(dst);
folio_set_owner_migrate_reason(dst, reason);
_
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH mm-unstable 1/1] mm: fix deferred split queue races during migration
2026-04-01 23:19 ` Andrew Morton
@ 2026-04-03 4:24 ` Lance Yang
0 siblings, 0 replies; 9+ messages in thread
From: Lance Yang @ 2026-04-03 4:24 UTC (permalink / raw)
To: Andrew Morton, Zi Yan, david
Cc: ljs, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain,
baohua, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, richard.weiyang, usama.arif,
linux-mm, linux-kernel, kartikey406, syzbot+a7067a757858ac8eb085,
stable
On 2026/4/2 07:19, Andrew Morton wrote:
> On Wed, 01 Apr 2026 18:55:48 -0400 Zi Yan <ziy@nvidia.com> wrote:
>
>> Can you apply the fixup below to move the comment? Lance told me he
>> would be away for a while, so he could not send a fixup to move
>> the comment.
>
> Thanks. I folded that into Lance's base patch so here's the whole
> thing:
>
Thank you all!
Lance
>
> From: Lance Yang <lance.yang@linux.dev>
> Subject: mm: fix deferred split queue races during migration
> Date: Wed, 1 Apr 2026 21:10:32 +0800
>
> migrate_folio_move() records the deferred split queue state from src and
> replays it on dst. Replaying it after remove_migration_ptes(src, dst, 0)
> makes dst visible before it is requeued, so a concurrent rmap-removal path
> can mark dst partially mapped and trip the WARN in deferred_split_folio().
>
> Move the requeue before remove_migration_ptes() so dst is back on the
> deferred split queue before it becomes visible again.
>
> Because migration still holds dst locked at that point, teach
> deferred_split_scan() to requeue a folio when folio_trylock() fails.
> Otherwise a fully mapped underused folio can be dequeued by the shrinker
> and silently lost from split_queue.
>
> [ziy@nvidia.com: move the comment]
> Link: https://lkml.kernel.org/r/FB71A764-0F10-4E5A-B4A0-BA4C7F138408@nvidia.com
> Link: https://syzkaller.appspot.com/bug?extid=a7067a757858ac8eb085
> Link: https://lkml.kernel.org/r/20260401131032.13011-1-lance.yang@linux.dev
> Fixes: 8a8ca142a488 ("mm: migrate: requeue destination folio on deferred split queue")
> Signed-off-by: Lance Yang <lance.yang@linux.dev>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> Reported-by: syzbot+a7067a757858ac8eb085@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/linux-mm/69ccb65b.050a0220.183828.003a.GAE@google.com/
> Suggested-by: David Hildenbrand (Arm) <david@kernel.org>
> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
> Acked-by: Zi Yan <ziy@nvidia.com>
> Cc: Alistair Popple <apopple@nvidia.com>
> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
> Cc: Barry Song <baohua@kernel.org>
> Cc: Byungchul Park <byungchul@sk.com>
> Cc: David Hildenbrand <david@kernel.org>
> Cc: Deepanshu Kartikey <kartikey406@gmail.com>
> Cc: Dev Jain <dev.jain@arm.com>
> Cc: Gregory Price <gourry@gourry.net>
> Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
> Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
> Cc: Lance Yang <lance.yang@linux.dev>
> Cc: Liam Howlett <liam.howlett@oracle.com>
> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Nico Pache <npache@redhat.com>
> Cc: Rakie Kim <rakie.kim@sk.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Cc: Ying Huang <ying.huang@linux.alibaba.com>
> Cc: Usama Arif <usama.arif@linux.dev>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>
> mm/huge_memory.c | 15 ++++++++++-----
> mm/migrate.c | 18 +++++++++---------
> 2 files changed, 19 insertions(+), 14 deletions(-)
>
> --- a/mm/huge_memory.c~mm-fix-deferred-split-queue-races-during-migration
> +++ a/mm/huge_memory.c
> @@ -4542,7 +4542,7 @@ retry:
> goto next;
> }
> if (!folio_trylock(folio))
> - goto next;
> + goto requeue;
> if (!split_folio(folio)) {
> did_split = true;
> if (underused)
> @@ -4551,13 +4551,18 @@ retry:
> }
> folio_unlock(folio);
> next:
> + /*
> + * If thp_underused() returns false, or if split_folio()
> + * succeeds, or if split_folio() fails in the case it was
> + * underused, then consider it used and don't add it back to
> + * split_queue.
> + */
> if (did_split || !folio_test_partially_mapped(folio))
> continue;
> +requeue:
> /*
> - * Only add back to the queue if folio is partially mapped.
> - * If thp_underused returns false, or if split_folio fails
> - * in the case it was underused, then consider it used and
> - * don't add it back to split_queue.
> + * Add back partially mapped folios, or underused folios that
> + * we could not lock this round.
> */
> fqueue = folio_split_queue_lock_irqsave(folio, &flags);
> if (list_empty(&folio->_deferred_list)) {
> --- a/mm/migrate.c~mm-fix-deferred-split-queue-races-during-migration
> +++ a/mm/migrate.c
> @@ -1384,6 +1384,15 @@ static int migrate_folio_move(free_folio
> goto out;
>
> /*
> + * Requeue the destination folio on the deferred split queue if
> + * the source was on the queue. The source is unqueued in
> + * __folio_migrate_mapping(), so we recorded the state from
> + * before move_to_new_folio().
> + */
> + if (src_deferred_split)
> + deferred_split_folio(dst, src_partially_mapped);
> +
> + /*
> * When successful, push dst to LRU immediately: so that if it
> * turns out to be an mlocked page, remove_migration_ptes() will
> * automatically build up the correct dst->mlock_count for it.
> @@ -1399,15 +1408,6 @@ static int migrate_folio_move(free_folio
> if (old_page_state & PAGE_WAS_MAPPED)
> remove_migration_ptes(src, dst, 0);
>
> - /*
> - * Requeue the destination folio on the deferred split queue if
> - * the source was on the queue. The source is unqueued in
> - * __folio_migrate_mapping(), so we recorded the state from
> - * before move_to_new_folio().
> - */
> - if (src_deferred_split)
> - deferred_split_folio(dst, src_partially_mapped);
> -
> out_unlock_both:
> folio_unlock(dst);
> folio_set_owner_migrate_reason(dst, reason);
> _
>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-04-03 4:24 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-01 13:10 [PATCH mm-unstable 1/1] mm: fix deferred split queue races during migration Lance Yang
2026-04-01 16:28 ` Usama Arif
2026-04-01 18:50 ` David Hildenbrand (Arm)
2026-04-01 18:51 ` David Hildenbrand (Arm)
2026-04-01 19:21 ` Zi Yan
2026-04-01 22:55 ` Zi Yan
2026-04-01 23:19 ` Andrew Morton
2026-04-03 4:24 ` Lance Yang
2026-04-01 21:48 ` Andrew Morton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox