All of lore.kernel.org
 help / color / mirror / Atom feed
From: Usama Arif <usama.arif@linux.dev>
To: Zi Yan <ziy@nvidia.com>
Cc: "David Hildenbrand (Arm)" <david@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	npache@redhat.com, linux-mm@kvack.org, matthew.brost@intel.com,
	joshua.hahnjy@gmail.com, hannes@cmpxchg.org, rakie.kim@sk.com,
	byungchul@sk.com, gourry@gourry.net,
	ying.huang@linux.alibaba.com, apopple@nvidia.com,
	linux-kernel@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH] mm: migrate: requeue destination folio on deferred split queue
Date: Fri, 6 Mar 2026 19:15:07 +0300	[thread overview]
Message-ID: <28e48b47-f215-4e4a-b55a-01dbf293ff35@linux.dev> (raw)
In-Reply-To: <993B37FF-29E1-41F5-A1E8-F38B9CD24478@nvidia.com>



On 06/03/2026 14:46, Zi Yan wrote:
> On 6 Mar 2026, at 9:12, Usama Arif wrote:
> 
>> On 06/03/2026 13:49, David Hildenbrand (Arm) wrote:
>>> On 3/6/26 14:35, Usama Arif wrote:
>>>> During folio migration, __folio_migrate_mapping() removes the source
>>>> folio from the deferred split queue, but the destination folio is never
>>>> re-queued.  This causes underutilized THPs to escape the shrinker after
>>>> NUMA migration, since they silently drop off the deferred split list.
>>>>
>>>> Fix this by calling deferred_split_folio() on the destination folio
>>>> after a successful migration, for large rmappable folios.
>>>>
>>>> Reported-by: Johannes Weiner <hannes@cmpxchg.org>
>>>> Fixes: dafff3f4c850 ("mm: split underused THPs")
>>>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>>>> ---
>>>>  mm/migrate.c | 11 +++++++++++
>>>>  1 file changed, 11 insertions(+)
>>>>
>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>> index ece77ccb2ec0..98d0a594f7b7 100644
>>>> --- a/mm/migrate.c
>>>> +++ b/mm/migrate.c
>>>> @@ -1393,6 +1393,17 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>>>>  	if (old_page_state & PAGE_WAS_MAPPED)
>>>>  		remove_migration_ptes(src, dst, 0);
>>>>
>>>> +	/*
>>>> +	 * Requeue the destination folio on the deferred split queue if
>>>> +	 * the source was a large folio that was on the queue. Without
>>>> +	 * this, NUMA migration causes underutilized THPs to escape
>>>> +	 * the shrinker since the source is unqueued in
>>>> +	 * __folio_migrate_mapping() and the destination is never
>>>> +	 * re-queued.
>>>> +	 */
>>>> +	if (folio_test_large(dst) && folio_test_large_rmappable(dst))
>>>> +		deferred_split_folio(dst, false);
>>>
>>> Doesn't that mean that you will readd any large folios, even if already
>>> previously taken off the list after scanning?
>>>
>>> So I am not sure if your "if the source was a large folio that was on
>>> the queue." comment is accurate?
>>>
>>
>> Yes you are right. How about something like below? We also won't need to check
>> for anon and non-device folios with this as we only set the the flag if it was
>> already on deferred_split list.
> 
> BTW, migrate_pages() tries to split partially mapped folios before migration[1],
> so what remains in the deferred_list would be:
> 
> 1. partially mapped but with a pin,
> 2. fully mapped but potentially underused.
> 

Yes, thats right.

> I wonder if you want to do an underused scan before migration and try to split
> underused THPs.

hmm, I think we should keep THPs as is if there is no memory pressure (proactive
or otherwise). Scanning THPs for zeros has a cost and we would also lose the benefit
of THPs when we dont need memory.

> Or to avoid this additional scan, find a way of detecting
> zero pages at page copy time and split it after migration.
> 

Yeah but I think we lose the benefits of THPs after migration when we dont need
additional memory?

> Anyway, it seems that all large folios are in this deferred_list. Maybe, like
> David suggested in his LSFMM proposal, we should scan large folios on LRU lists
> at reclaim time instead, since there is not much difference between deferred_list
> and LRU lists right now.
> 

Yeah the THP shrinker is a very basic implementation and there are a lot of 

> 
> [1] https://elixir.bootlin.com/linux/v6.19.3/source/mm/migrate.c#L1840
> 

Also Johannes pointed out its not great storing this information in page flags,
we can just keep it as local variable. This is what the patch would look like:


diff --git a/mm/migrate.c b/mm/migrate.c
index ece77ccb2ec0..48a972f158ab 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1360,6 +1360,7 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
        int rc;
        int old_page_state = 0;
        struct anon_vma *anon_vma = NULL;
+       bool src_deferred_split = false;
        struct list_head *prev;
 
        __migrate_folio_extract(dst, &old_page_state, &anon_vma);
@@ -1373,6 +1374,10 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
                goto out_unlock_both;
        }
 
+       if (folio_test_large(src) && folio_test_large_rmappable(src) &&
+           !data_race(list_empty(&src->_deferred_list)))
+               src_deferred_split = true;
+
        rc = move_to_new_folio(dst, src, mode);
        if (rc)
                goto out;
@@ -1393,6 +1398,15 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
        if (old_page_state & PAGE_WAS_MAPPED)
                remove_migration_ptes(src, dst, 0);
 
+       /*
+        * Requeue the destination folio on the deferred split queue if
+        * the source was on the queue.  The source is unqueued in
+        * __folio_migrate_mapping(), so we recorded the state from
+        * before move_to_new_folio().
+        */
+       if (src_deferred_split)
+               deferred_split_folio(dst, false);
+
 out_unlock_both:
        folio_unlock(dst);
        folio_set_owner_migrate_reason(dst, reason);
 


  reply	other threads:[~2026-03-06 16:15 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-06 13:35 [PATCH] mm: migrate: requeue destination folio on deferred split queue Usama Arif
2026-03-06 13:49 ` David Hildenbrand (Arm)
2026-03-06 14:12   ` Usama Arif
2026-03-06 14:46     ` Zi Yan
2026-03-06 16:15       ` Usama Arif [this message]
2026-03-06 16:23         ` David Hildenbrand (Arm)
2026-03-06 16:26         ` Zi Yan
2026-03-06 16:08     ` Matthew Wilcox
2026-03-06 16:19       ` Usama Arif
2026-03-06 13:51 ` David Hildenbrand (Arm)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=28e48b47-f215-4e4a-b55a-01dbf293ff35@linux.dev \
    --to=usama.arif@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=byungchul@sk.com \
    --cc=david@kernel.org \
    --cc=gourry@gourry.net \
    --cc=hannes@cmpxchg.org \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matthew.brost@intel.com \
    --cc=npache@redhat.com \
    --cc=rakie.kim@sk.com \
    --cc=ying.huang@linux.alibaba.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.