The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [PATCH v2] mm: migrate: requeue destination folio on deferred split queue
@ 2026-03-10 10:54 Usama Arif
  2026-03-10 16:52 ` Zi Yan
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Usama Arif @ 2026-03-10 10:54 UTC (permalink / raw)
  To: Andrew Morton, npache, david, ziy, willy, linux-mm
  Cc: matthew.brost, joshua.hahnjy, hannes, rakie.kim, byungchul,
	gourry, ying.huang, apopple, linux-kernel, kernel-team,
	Usama Arif

During folio migration, __folio_migrate_mapping() removes the source
folio from the deferred split queue, but the destination folio is never
re-queued.  This causes underutilized THPs to escape the shrinker after
NUMA migration, since they silently drop off the deferred split list.

Fix this by recording whether the source folio was on the deferred split
queue and its partially mapped state before move_to_new_folio() unqueues
it, and re-queuing the destination folio after a successful migration if
it was.

By the time migrate_folio_move() runs, partially mapped folios without a
pin have already been split by migrate_pages_batch().  So only two cases
remain on the deferred list at this point:
  1. Partially mapped folios with a pin (split failed).
  2. Fully mapped but potentially underused folios.
The recorded partially_mapped state is forwarded to deferred_split_folio()
so that the destination folio is correctly re-queued in both cases.

Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Fixes: dafff3f4c850 ("mm: split underused THPs")
Signed-off-by: Usama Arif <usama.arif@linux.dev>
---
v1 -> v2:
- record whether source folio was on the deferred split queue before
  move_to_folio() (David)
- record partially mapped state and update commit message (Zi)
---
 mm/migrate.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/mm/migrate.c b/mm/migrate.c
index ece77ccb2ec0..61013d258eb4 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1360,6 +1360,8 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
 	int rc;
 	int old_page_state = 0;
 	struct anon_vma *anon_vma = NULL;
+	bool src_deferred_split = false;
+	bool src_partially_mapped = false;
 	struct list_head *prev;
 
 	__migrate_folio_extract(dst, &old_page_state, &anon_vma);
@@ -1373,6 +1375,12 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
 		goto out_unlock_both;
 	}
 
+	if (folio_test_large(src) && folio_test_large_rmappable(src) &&
+	    !data_race(list_empty(&src->_deferred_list))) {
+		src_deferred_split = true;
+		src_partially_mapped = folio_test_partially_mapped(src);
+	}
+
 	rc = move_to_new_folio(dst, src, mode);
 	if (rc)
 		goto out;
@@ -1393,6 +1401,15 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
 	if (old_page_state & PAGE_WAS_MAPPED)
 		remove_migration_ptes(src, dst, 0);
 
+	/*
+	 * Requeue the destination folio on the deferred split queue if
+	 * the source was on the queue.  The source is unqueued in
+	 * __folio_migrate_mapping(), so we recorded the state from
+	 * before move_to_new_folio().
+	 */
+	if (src_deferred_split)
+		deferred_split_folio(dst, src_partially_mapped);
+
 out_unlock_both:
 	folio_unlock(dst);
 	folio_set_owner_migrate_reason(dst, reason);
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm: migrate: requeue destination folio on deferred split queue
  2026-03-10 10:54 [PATCH v2] mm: migrate: requeue destination folio on deferred split queue Usama Arif
@ 2026-03-10 16:52 ` Zi Yan
  2026-03-11  9:23 ` David Hildenbrand (Arm)
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Zi Yan @ 2026-03-10 16:52 UTC (permalink / raw)
  To: Usama Arif
  Cc: Andrew Morton, npache, david, willy, linux-mm, matthew.brost,
	joshua.hahnjy, hannes, rakie.kim, byungchul, gourry, ying.huang,
	apopple, linux-kernel, kernel-team

On 10 Mar 2026, at 6:54, Usama Arif wrote:

> During folio migration, __folio_migrate_mapping() removes the source
> folio from the deferred split queue, but the destination folio is never
> re-queued.  This causes underutilized THPs to escape the shrinker after
> NUMA migration, since they silently drop off the deferred split list.
>
> Fix this by recording whether the source folio was on the deferred split
> queue and its partially mapped state before move_to_new_folio() unqueues
> it, and re-queuing the destination folio after a successful migration if
> it was.
>
> By the time migrate_folio_move() runs, partially mapped folios without a
> pin have already been split by migrate_pages_batch().  So only two cases
> remain on the deferred list at this point:
>   1. Partially mapped folios with a pin (split failed).
>   2. Fully mapped but potentially underused folios.
> The recorded partially_mapped state is forwarded to deferred_split_folio()
> so that the destination folio is correctly re-queued in both cases.
>
> Reported-by: Johannes Weiner <hannes@cmpxchg.org>
> Fixes: dafff3f4c850 ("mm: split underused THPs")
> Signed-off-by: Usama Arif <usama.arif@linux.dev>
> ---
> v1 -> v2:
> - record whether source folio was on the deferred split queue before
>   move_to_folio() (David)
> - record partially mapped state and update commit message (Zi)
> ---
>  mm/migrate.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
>
LGTM.

Acked-by: Zi Yan <ziy@nvidia.com>

Best Regards,
Yan, Zi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm: migrate: requeue destination folio on deferred split queue
  2026-03-10 10:54 [PATCH v2] mm: migrate: requeue destination folio on deferred split queue Usama Arif
  2026-03-10 16:52 ` Zi Yan
@ 2026-03-11  9:23 ` David Hildenbrand (Arm)
  2026-03-11 13:25   ` Usama Arif
  2026-03-12  3:18 ` Wei Yang
  2026-06-20  7:27 ` Wei Yang
  3 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-11  9:23 UTC (permalink / raw)
  To: Usama Arif, Andrew Morton, npache, ziy, willy, linux-mm
  Cc: matthew.brost, joshua.hahnjy, hannes, rakie.kim, byungchul,
	gourry, ying.huang, apopple, linux-kernel, kernel-team

On 3/10/26 11:54, Usama Arif wrote:
> During folio migration, __folio_migrate_mapping() removes the source
> folio from the deferred split queue, but the destination folio is never
> re-queued.  This causes underutilized THPs to escape the shrinker after
> NUMA migration, since they silently drop off the deferred split list.
> 
> Fix this by recording whether the source folio was on the deferred split
> queue and its partially mapped state before move_to_new_folio() unqueues
> it, and re-queuing the destination folio after a successful migration if
> it was.
> 
> By the time migrate_folio_move() runs, partially mapped folios without a
> pin have already been split by migrate_pages_batch().  So only two cases
> remain on the deferred list at this point:
>   1. Partially mapped folios with a pin (split failed).
>   2. Fully mapped but potentially underused folios.
> The recorded partially_mapped state is forwarded to deferred_split_folio()
> so that the destination folio is correctly re-queued in both cases.
> 
> Reported-by: Johannes Weiner <hannes@cmpxchg.org>
> Fixes: dafff3f4c850 ("mm: split underused THPs")
> Signed-off-by: Usama Arif <usama.arif@linux.dev>
> ---
> v1 -> v2:
> - record whether source folio was on the deferred split queue before
>   move_to_folio() (David)
> - record partially mapped state and update commit message (Zi)
> ---
>  mm/migrate.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index ece77ccb2ec0..61013d258eb4 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1360,6 +1360,8 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>  	int rc;
>  	int old_page_state = 0;
>  	struct anon_vma *anon_vma = NULL;
> +	bool src_deferred_split = false;
> +	bool src_partially_mapped = false;
>  	struct list_head *prev;
>  
>  	__migrate_folio_extract(dst, &old_page_state, &anon_vma);
> @@ -1373,6 +1375,12 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>  		goto out_unlock_both;
>  	}
>  
> +	if (folio_test_large(src) && folio_test_large_rmappable(src) &&

I don't think the folio_test_large_rmappable() check is required. Other
folios we migrate here would always have _deferred_list initialized but
unused.

Acked-by: David Hildenbrand (Arm) <david@kernel.org>

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm: migrate: requeue destination folio on deferred split queue
  2026-03-11  9:23 ` David Hildenbrand (Arm)
@ 2026-03-11 13:25   ` Usama Arif
  0 siblings, 0 replies; 10+ messages in thread
From: Usama Arif @ 2026-03-11 13:25 UTC (permalink / raw)
  To: David Hildenbrand (Arm), Andrew Morton, npache, ziy, willy,
	linux-mm
  Cc: matthew.brost, joshua.hahnjy, hannes, rakie.kim, byungchul,
	gourry, ying.huang, apopple, linux-kernel, kernel-team



On 11/03/2026 12:23, David Hildenbrand (Arm) wrote:
> On 3/10/26 11:54, Usama Arif wrote:
>> During folio migration, __folio_migrate_mapping() removes the source
>> folio from the deferred split queue, but the destination folio is never
>> re-queued.  This causes underutilized THPs to escape the shrinker after
>> NUMA migration, since they silently drop off the deferred split list.
>>
>> Fix this by recording whether the source folio was on the deferred split
>> queue and its partially mapped state before move_to_new_folio() unqueues
>> it, and re-queuing the destination folio after a successful migration if
>> it was.
>>
>> By the time migrate_folio_move() runs, partially mapped folios without a
>> pin have already been split by migrate_pages_batch().  So only two cases
>> remain on the deferred list at this point:
>>   1. Partially mapped folios with a pin (split failed).
>>   2. Fully mapped but potentially underused folios.
>> The recorded partially_mapped state is forwarded to deferred_split_folio()
>> so that the destination folio is correctly re-queued in both cases.
>>
>> Reported-by: Johannes Weiner <hannes@cmpxchg.org>
>> Fixes: dafff3f4c850 ("mm: split underused THPs")
>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>> ---
>> v1 -> v2:
>> - record whether source folio was on the deferred split queue before
>>   move_to_folio() (David)
>> - record partially mapped state and update commit message (Zi)
>> ---
>>  mm/migrate.c | 17 +++++++++++++++++
>>  1 file changed, 17 insertions(+)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index ece77ccb2ec0..61013d258eb4 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1360,6 +1360,8 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>>  	int rc;
>>  	int old_page_state = 0;
>>  	struct anon_vma *anon_vma = NULL;
>> +	bool src_deferred_split = false;
>> +	bool src_partially_mapped = false;
>>  	struct list_head *prev;
>>  
>>  	__migrate_folio_extract(dst, &old_page_state, &anon_vma);
>> @@ -1373,6 +1375,12 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>>  		goto out_unlock_both;
>>  	}
>>  
>> +	if (folio_test_large(src) && folio_test_large_rmappable(src) &&
> 
> I don't think the folio_test_large_rmappable() check is required. Other
> folios we migrate here would always have _deferred_list initialized but
> unused.
> 
> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
> 


I have been auditing the THP shrinker code when it comes to NUMA migration and I think we need
another fix for this. I have sent it here https://lore.kernel.org/all/20260311132342.3193160-1-usama.arif@linux.dev/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm: migrate: requeue destination folio on deferred split queue
  2026-03-10 10:54 [PATCH v2] mm: migrate: requeue destination folio on deferred split queue Usama Arif
  2026-03-10 16:52 ` Zi Yan
  2026-03-11  9:23 ` David Hildenbrand (Arm)
@ 2026-03-12  3:18 ` Wei Yang
  2026-03-12  8:26   ` David Hildenbrand (Arm)
  2026-06-20  7:27 ` Wei Yang
  3 siblings, 1 reply; 10+ messages in thread
From: Wei Yang @ 2026-03-12  3:18 UTC (permalink / raw)
  To: Usama Arif
  Cc: Andrew Morton, npache, david, ziy, willy, linux-mm, matthew.brost,
	joshua.hahnjy, hannes, rakie.kim, byungchul, gourry, ying.huang,
	apopple, linux-kernel, kernel-team

On Tue, Mar 10, 2026 at 03:54:19AM -0700, Usama Arif wrote:
>During folio migration, __folio_migrate_mapping() removes the source
>folio from the deferred split queue, but the destination folio is never
>re-queued.  This causes underutilized THPs to escape the shrinker after
>NUMA migration, since they silently drop off the deferred split list.
>
>Fix this by recording whether the source folio was on the deferred split
>queue and its partially mapped state before move_to_new_folio() unqueues
>it, and re-queuing the destination folio after a successful migration if
>it was.
>
>By the time migrate_folio_move() runs, partially mapped folios without a
>pin have already been split by migrate_pages_batch().  So only two cases
>remain on the deferred list at this point:
>  1. Partially mapped folios with a pin (split failed).
>  2. Fully mapped but potentially underused folios.
>The recorded partially_mapped state is forwarded to deferred_split_folio()
>so that the destination folio is correctly re-queued in both cases.
>
>Reported-by: Johannes Weiner <hannes@cmpxchg.org>
>Fixes: dafff3f4c850 ("mm: split underused THPs")
>Signed-off-by: Usama Arif <usama.arif@linux.dev>
>---
>v1 -> v2:
>- record whether source folio was on the deferred split queue before
>  move_to_folio() (David)
>- record partially mapped state and update commit message (Zi)
>---
> mm/migrate.c | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
>diff --git a/mm/migrate.c b/mm/migrate.c
>index ece77ccb2ec0..61013d258eb4 100644
>--- a/mm/migrate.c
>+++ b/mm/migrate.c
>@@ -1360,6 +1360,8 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
> 	int rc;
> 	int old_page_state = 0;
> 	struct anon_vma *anon_vma = NULL;
>+	bool src_deferred_split = false;
>+	bool src_partially_mapped = false;
> 	struct list_head *prev;
> 
> 	__migrate_folio_extract(dst, &old_page_state, &anon_vma);
>@@ -1373,6 +1375,12 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
> 		goto out_unlock_both;
> 	}
> 
>+	if (folio_test_large(src) && folio_test_large_rmappable(src) &&
>+	    !data_race(list_empty(&src->_deferred_list))) {

We usually check order > 1, before accessing _deferred_list, because it is in
subpage 2.

I am not sure why we don't do it here. Do I miss something?

>+		src_deferred_split = true;
>+		src_partially_mapped = folio_test_partially_mapped(src);
>+	}
>+
> 	rc = move_to_new_folio(dst, src, mode);
> 	if (rc)
> 		goto out;
>@@ -1393,6 +1401,15 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
> 	if (old_page_state & PAGE_WAS_MAPPED)
> 		remove_migration_ptes(src, dst, 0);
> 
>+	/*
>+	 * Requeue the destination folio on the deferred split queue if
>+	 * the source was on the queue.  The source is unqueued in
>+	 * __folio_migrate_mapping(), so we recorded the state from
>+	 * before move_to_new_folio().
>+	 */
>+	if (src_deferred_split)
>+		deferred_split_folio(dst, src_partially_mapped);
>+
> out_unlock_both:
> 	folio_unlock(dst);
> 	folio_set_owner_migrate_reason(dst, reason);
>-- 
>2.47.3
>

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm: migrate: requeue destination folio on deferred split queue
  2026-03-12  3:18 ` Wei Yang
@ 2026-03-12  8:26   ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 10+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-12  8:26 UTC (permalink / raw)
  To: Wei Yang, Usama Arif
  Cc: Andrew Morton, npache, ziy, willy, linux-mm, matthew.brost,
	joshua.hahnjy, hannes, rakie.kim, byungchul, gourry, ying.huang,
	apopple, linux-kernel, kernel-team

On 3/12/26 04:18, Wei Yang wrote:
> On Tue, Mar 10, 2026 at 03:54:19AM -0700, Usama Arif wrote:
>> During folio migration, __folio_migrate_mapping() removes the source
>> folio from the deferred split queue, but the destination folio is never
>> re-queued.  This causes underutilized THPs to escape the shrinker after
>> NUMA migration, since they silently drop off the deferred split list.
>>
>> Fix this by recording whether the source folio was on the deferred split
>> queue and its partially mapped state before move_to_new_folio() unqueues
>> it, and re-queuing the destination folio after a successful migration if
>> it was.
>>
>> By the time migrate_folio_move() runs, partially mapped folios without a
>> pin have already been split by migrate_pages_batch().  So only two cases
>> remain on the deferred list at this point:
>>  1. Partially mapped folios with a pin (split failed).
>>  2. Fully mapped but potentially underused folios.
>> The recorded partially_mapped state is forwarded to deferred_split_folio()
>> so that the destination folio is correctly re-queued in both cases.
>>
>> Reported-by: Johannes Weiner <hannes@cmpxchg.org>
>> Fixes: dafff3f4c850 ("mm: split underused THPs")
>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>> ---
>> v1 -> v2:
>> - record whether source folio was on the deferred split queue before
>>  move_to_folio() (David)
>> - record partially mapped state and update commit message (Zi)
>> ---
>> mm/migrate.c | 17 +++++++++++++++++
>> 1 file changed, 17 insertions(+)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index ece77ccb2ec0..61013d258eb4 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1360,6 +1360,8 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>> 	int rc;
>> 	int old_page_state = 0;
>> 	struct anon_vma *anon_vma = NULL;
>> +	bool src_deferred_split = false;
>> +	bool src_partially_mapped = false;
>> 	struct list_head *prev;
>>
>> 	__migrate_folio_extract(dst, &old_page_state, &anon_vma);
>> @@ -1373,6 +1375,12 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>> 		goto out_unlock_both;
>> 	}
>>
>> +	if (folio_test_large(src) && folio_test_large_rmappable(src) &&
>> +	    !data_race(list_empty(&src->_deferred_list))) {
> 
> We usually check order > 1, before accessing _deferred_list, because it is in
> subpage 2.
> 
> I am not sure why we don't do it here. Do I miss something?

Valid point! non-anon folios could trigger that.

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm: migrate: requeue destination folio on deferred split queue
  2026-03-10 10:54 [PATCH v2] mm: migrate: requeue destination folio on deferred split queue Usama Arif
                   ` (2 preceding siblings ...)
  2026-03-12  3:18 ` Wei Yang
@ 2026-06-20  7:27 ` Wei Yang
  2026-06-22  9:16   ` David Hildenbrand (Arm)
  3 siblings, 1 reply; 10+ messages in thread
From: Wei Yang @ 2026-06-20  7:27 UTC (permalink / raw)
  To: Usama Arif
  Cc: Andrew Morton, npache, david, ziy, willy, linux-mm, matthew.brost,
	joshua.hahnjy, hannes, rakie.kim, byungchul, gourry, ying.huang,
	apopple, linux-kernel, kernel-team

On Tue, Mar 10, 2026 at 03:54:19AM -0700, Usama Arif wrote:
>During folio migration, __folio_migrate_mapping() removes the source
>folio from the deferred split queue, but the destination folio is never
>re-queued.  This causes underutilized THPs to escape the shrinker after
>NUMA migration, since they silently drop off the deferred split list.
>
>Fix this by recording whether the source folio was on the deferred split
>queue and its partially mapped state before move_to_new_folio() unqueues
>it, and re-queuing the destination folio after a successful migration if
>it was.
>
>By the time migrate_folio_move() runs, partially mapped folios without a
>pin have already been split by migrate_pages_batch().  So only two cases
>remain on the deferred list at this point:
>  1. Partially mapped folios with a pin (split failed).
>  2. Fully mapped but potentially underused folios.
>The recorded partially_mapped state is forwarded to deferred_split_folio()
>so that the destination folio is correctly re-queued in both cases.
>
>Reported-by: Johannes Weiner <hannes@cmpxchg.org>
>Fixes: dafff3f4c850 ("mm: split underused THPs")
>Signed-off-by: Usama Arif <usama.arif@linux.dev>
>---
>v1 -> v2:
>- record whether source folio was on the deferred split queue before
>  move_to_folio() (David)
>- record partially mapped state and update commit message (Zi)
>---
> mm/migrate.c | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
>diff --git a/mm/migrate.c b/mm/migrate.c
>index ece77ccb2ec0..61013d258eb4 100644
>--- a/mm/migrate.c
>+++ b/mm/migrate.c
>@@ -1360,6 +1360,8 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
> 	int rc;
> 	int old_page_state = 0;
> 	struct anon_vma *anon_vma = NULL;
>+	bool src_deferred_split = false;
>+	bool src_partially_mapped = false;
> 	struct list_head *prev;
> 
> 	__migrate_folio_extract(dst, &old_page_state, &anon_vma);
>@@ -1373,6 +1375,12 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
> 		goto out_unlock_both;
> 	}
> 
>+	if (folio_test_large(src) && folio_test_large_rmappable(src) &&
>+	    !data_race(list_empty(&src->_deferred_list))) {
>+		src_deferred_split = true;
>+		src_partially_mapped = folio_test_partially_mapped(src);
>+	}

Hi, Usama

I am afraid there maybe a race between migration and defer_split.

                A                              B
  migrate_pages_batch                   deferred_split_scan
    migrate_folio_unmap                   list_del_init(&folio->_deferred_list)
      folio_lock/folio_trylock

    migrate_folios_move
      migrate_folio_move
        list_empty(&src->_deferred_list)
                                          folio_trylock()
                                          requeue:

In case list_empty() check happens after folio removed from defer_list but
before requeued, we will miss this folio.

And the behavior is the same before commit fafaeceb89a5, IIUC.

Do you thinks this will happen?

>+
> 	rc = move_to_new_folio(dst, src, mode);
> 	if (rc)
> 		goto out;
>@@ -1393,6 +1401,15 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
> 	if (old_page_state & PAGE_WAS_MAPPED)
> 		remove_migration_ptes(src, dst, 0);
> 
>+	/*
>+	 * Requeue the destination folio on the deferred split queue if
>+	 * the source was on the queue.  The source is unqueued in
>+	 * __folio_migrate_mapping(), so we recorded the state from
>+	 * before move_to_new_folio().
>+	 */
>+	if (src_deferred_split)
>+		deferred_split_folio(dst, src_partially_mapped);
>+
> out_unlock_both:
> 	folio_unlock(dst);
> 	folio_set_owner_migrate_reason(dst, reason);
>-- 
>2.47.3
>

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm: migrate: requeue destination folio on deferred split queue
  2026-06-20  7:27 ` Wei Yang
@ 2026-06-22  9:16   ` David Hildenbrand (Arm)
  2026-06-22 13:43     ` Wei Yang
  0 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-22  9:16 UTC (permalink / raw)
  To: Wei Yang, Usama Arif
  Cc: Andrew Morton, npache, ziy, willy, linux-mm, matthew.brost,
	joshua.hahnjy, hannes, rakie.kim, byungchul, gourry, ying.huang,
	apopple, linux-kernel, kernel-team

On 6/20/26 09:27, Wei Yang wrote:
> On Tue, Mar 10, 2026 at 03:54:19AM -0700, Usama Arif wrote:
>> During folio migration, __folio_migrate_mapping() removes the source
>> folio from the deferred split queue, but the destination folio is never
>> re-queued.  This causes underutilized THPs to escape the shrinker after
>> NUMA migration, since they silently drop off the deferred split list.
>>
>> Fix this by recording whether the source folio was on the deferred split
>> queue and its partially mapped state before move_to_new_folio() unqueues
>> it, and re-queuing the destination folio after a successful migration if
>> it was.
>>
>> By the time migrate_folio_move() runs, partially mapped folios without a
>> pin have already been split by migrate_pages_batch().  So only two cases
>> remain on the deferred list at this point:
>>  1. Partially mapped folios with a pin (split failed).
>>  2. Fully mapped but potentially underused folios.
>> The recorded partially_mapped state is forwarded to deferred_split_folio()
>> so that the destination folio is correctly re-queued in both cases.
>>
>> Reported-by: Johannes Weiner <hannes@cmpxchg.org>
>> Fixes: dafff3f4c850 ("mm: split underused THPs")
>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>> ---
>> v1 -> v2:
>> - record whether source folio was on the deferred split queue before
>>  move_to_folio() (David)
>> - record partially mapped state and update commit message (Zi)
>> ---
>> mm/migrate.c | 17 +++++++++++++++++
>> 1 file changed, 17 insertions(+)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index ece77ccb2ec0..61013d258eb4 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1360,6 +1360,8 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>> 	int rc;
>> 	int old_page_state = 0;
>> 	struct anon_vma *anon_vma = NULL;
>> +	bool src_deferred_split = false;
>> +	bool src_partially_mapped = false;
>> 	struct list_head *prev;
>>
>> 	__migrate_folio_extract(dst, &old_page_state, &anon_vma);
>> @@ -1373,6 +1375,12 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>> 		goto out_unlock_both;
>> 	}
>>
>> +	if (folio_test_large(src) && folio_test_large_rmappable(src) &&
>> +	    !data_race(list_empty(&src->_deferred_list))) {
>> +		src_deferred_split = true;
>> +		src_partially_mapped = folio_test_partially_mapped(src);
>> +	}
> 
> Hi, Usama
> 
> I am afraid there maybe a race between migration and defer_split.
> 
>                 A                              B
>   migrate_pages_batch                   deferred_split_scan
>     migrate_folio_unmap                   list_del_init(&folio->_deferred_list)
>       folio_lock/folio_trylock
> 
>     migrate_folios_move
>       migrate_folio_move
>         list_empty(&src->_deferred_list)
>                                           folio_trylock()
>                                           requeue:
> 
> In case list_empty() check happens after folio removed from defer_list but
> before requeued, we will miss this folio.

deferred_split_isolate() would grab a reference through folio_try_get().

How can we migrate a folio with a raised refcount?

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm: migrate: requeue destination folio on deferred split queue
  2026-06-22  9:16   ` David Hildenbrand (Arm)
@ 2026-06-22 13:43     ` Wei Yang
  2026-06-22 16:44       ` Usama Arif
  0 siblings, 1 reply; 10+ messages in thread
From: Wei Yang @ 2026-06-22 13:43 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Wei Yang, Usama Arif, Andrew Morton, npache, ziy, willy, linux-mm,
	matthew.brost, joshua.hahnjy, hannes, rakie.kim, byungchul,
	gourry, ying.huang, apopple, linux-kernel, kernel-team

On Mon, Jun 22, 2026 at 11:16:39AM +0200, David Hildenbrand (Arm) wrote:
>On 6/20/26 09:27, Wei Yang wrote:
>> On Tue, Mar 10, 2026 at 03:54:19AM -0700, Usama Arif wrote:
>>> During folio migration, __folio_migrate_mapping() removes the source
>>> folio from the deferred split queue, but the destination folio is never
>>> re-queued.  This causes underutilized THPs to escape the shrinker after
>>> NUMA migration, since they silently drop off the deferred split list.
>>>
>>> Fix this by recording whether the source folio was on the deferred split
>>> queue and its partially mapped state before move_to_new_folio() unqueues
>>> it, and re-queuing the destination folio after a successful migration if
>>> it was.
>>>
>>> By the time migrate_folio_move() runs, partially mapped folios without a
>>> pin have already been split by migrate_pages_batch().  So only two cases
>>> remain on the deferred list at this point:
>>>  1. Partially mapped folios with a pin (split failed).
>>>  2. Fully mapped but potentially underused folios.
>>> The recorded partially_mapped state is forwarded to deferred_split_folio()
>>> so that the destination folio is correctly re-queued in both cases.
>>>
>>> Reported-by: Johannes Weiner <hannes@cmpxchg.org>
>>> Fixes: dafff3f4c850 ("mm: split underused THPs")
>>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>>> ---
>>> v1 -> v2:
>>> - record whether source folio was on the deferred split queue before
>>>  move_to_folio() (David)
>>> - record partially mapped state and update commit message (Zi)
>>> ---
>>> mm/migrate.c | 17 +++++++++++++++++
>>> 1 file changed, 17 insertions(+)
>>>
>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>> index ece77ccb2ec0..61013d258eb4 100644
>>> --- a/mm/migrate.c
>>> +++ b/mm/migrate.c
>>> @@ -1360,6 +1360,8 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>>> 	int rc;
>>> 	int old_page_state = 0;
>>> 	struct anon_vma *anon_vma = NULL;
>>> +	bool src_deferred_split = false;
>>> +	bool src_partially_mapped = false;
>>> 	struct list_head *prev;
>>>
>>> 	__migrate_folio_extract(dst, &old_page_state, &anon_vma);
>>> @@ -1373,6 +1375,12 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>>> 		goto out_unlock_both;
>>> 	}
>>>
>>> +	if (folio_test_large(src) && folio_test_large_rmappable(src) &&
>>> +	    !data_race(list_empty(&src->_deferred_list))) {
>>> +		src_deferred_split = true;
>>> +		src_partially_mapped = folio_test_partially_mapped(src);
>>> +	}
>> 
>> Hi, Usama
>> 
>> I am afraid there maybe a race between migration and defer_split.
>> 
>>                 A                              B
>>   migrate_pages_batch                   deferred_split_scan
>>     migrate_folio_unmap                   list_del_init(&folio->_deferred_list)
>>       folio_lock/folio_trylock
>> 
>>     migrate_folios_move
>>       migrate_folio_move
>>         list_empty(&src->_deferred_list)
>>                                           folio_trylock()
>>                                           requeue:
>> 
>> In case list_empty() check happens after folio removed from defer_list but
>> before requeued, we will miss this folio.
>
>deferred_split_isolate() would grab a reference through folio_try_get().
>
>How can we migrate a folio with a raised refcount?
>

Thanks, I missed expected_refcount check in __migrate_folio().

>-- 
>Cheers,
>
>David

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm: migrate: requeue destination folio on deferred split queue
  2026-06-22 13:43     ` Wei Yang
@ 2026-06-22 16:44       ` Usama Arif
  0 siblings, 0 replies; 10+ messages in thread
From: Usama Arif @ 2026-06-22 16:44 UTC (permalink / raw)
  To: Wei Yang, David Hildenbrand (Arm)
  Cc: Andrew Morton, npache, ziy, willy, linux-mm, matthew.brost,
	joshua.hahnjy, hannes, rakie.kim, byungchul, gourry, ying.huang,
	apopple, linux-kernel, kernel-team



On 22/06/2026 14:43, Wei Yang wrote:
> On Mon, Jun 22, 2026 at 11:16:39AM +0200, David Hildenbrand (Arm) wrote:
>> On 6/20/26 09:27, Wei Yang wrote:
>>> On Tue, Mar 10, 2026 at 03:54:19AM -0700, Usama Arif wrote:
>>>> During folio migration, __folio_migrate_mapping() removes the source
>>>> folio from the deferred split queue, but the destination folio is never
>>>> re-queued.  This causes underutilized THPs to escape the shrinker after
>>>> NUMA migration, since they silently drop off the deferred split list.
>>>>
>>>> Fix this by recording whether the source folio was on the deferred split
>>>> queue and its partially mapped state before move_to_new_folio() unqueues
>>>> it, and re-queuing the destination folio after a successful migration if
>>>> it was.
>>>>
>>>> By the time migrate_folio_move() runs, partially mapped folios without a
>>>> pin have already been split by migrate_pages_batch().  So only two cases
>>>> remain on the deferred list at this point:
>>>>  1. Partially mapped folios with a pin (split failed).
>>>>  2. Fully mapped but potentially underused folios.
>>>> The recorded partially_mapped state is forwarded to deferred_split_folio()
>>>> so that the destination folio is correctly re-queued in both cases.
>>>>
>>>> Reported-by: Johannes Weiner <hannes@cmpxchg.org>
>>>> Fixes: dafff3f4c850 ("mm: split underused THPs")
>>>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>>>> ---
>>>> v1 -> v2:
>>>> - record whether source folio was on the deferred split queue before
>>>>  move_to_folio() (David)
>>>> - record partially mapped state and update commit message (Zi)
>>>> ---
>>>> mm/migrate.c | 17 +++++++++++++++++
>>>> 1 file changed, 17 insertions(+)
>>>>
>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>> index ece77ccb2ec0..61013d258eb4 100644
>>>> --- a/mm/migrate.c
>>>> +++ b/mm/migrate.c
>>>> @@ -1360,6 +1360,8 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>>>> 	int rc;
>>>> 	int old_page_state = 0;
>>>> 	struct anon_vma *anon_vma = NULL;
>>>> +	bool src_deferred_split = false;
>>>> +	bool src_partially_mapped = false;
>>>> 	struct list_head *prev;
>>>>
>>>> 	__migrate_folio_extract(dst, &old_page_state, &anon_vma);
>>>> @@ -1373,6 +1375,12 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>>>> 		goto out_unlock_both;
>>>> 	}
>>>>
>>>> +	if (folio_test_large(src) && folio_test_large_rmappable(src) &&
>>>> +	    !data_race(list_empty(&src->_deferred_list))) {
>>>> +		src_deferred_split = true;
>>>> +		src_partially_mapped = folio_test_partially_mapped(src);
>>>> +	}
>>>
>>> Hi, Usama
>>>
>>> I am afraid there maybe a race between migration and defer_split.
>>>
>>>                 A                              B
>>>   migrate_pages_batch                   deferred_split_scan
>>>     migrate_folio_unmap                   list_del_init(&folio->_deferred_list)
>>>       folio_lock/folio_trylock
>>>
>>>     migrate_folios_move
>>>       migrate_folio_move
>>>         list_empty(&src->_deferred_list)
>>>                                           folio_trylock()
>>>                                           requeue:
>>>
>>> In case list_empty() check happens after folio removed from defer_list but
>>> before requeued, we will miss this folio.
>>
>> deferred_split_isolate() would grab a reference through folio_try_get().
>>
>> How can we migrate a folio with a raised refcount?
>>
> 
> Thanks, I missed expected_refcount check in __migrate_folio().
> 

Thanks David for pointing it out! I have just started looking at the
mailing list for today :)

>> -- 
>> Cheers,
>>
>> David
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-06-22 16:45 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-10 10:54 [PATCH v2] mm: migrate: requeue destination folio on deferred split queue Usama Arif
2026-03-10 16:52 ` Zi Yan
2026-03-11  9:23 ` David Hildenbrand (Arm)
2026-03-11 13:25   ` Usama Arif
2026-03-12  3:18 ` Wei Yang
2026-03-12  8:26   ` David Hildenbrand (Arm)
2026-06-20  7:27 ` Wei Yang
2026-06-22  9:16   ` David Hildenbrand (Arm)
2026-06-22 13:43     ` Wei Yang
2026-06-22 16:44       ` Usama Arif

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox