Re: [PATCH 1/2] mm/huge_memory: return true if split_huge_pmd_locked() split PMD to migration entry

public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed

From: Wei Yang <richard.weiyang@gmail.com>
To: "David Hildenbrand (Arm)" <david@kernel.org>
Cc: Wei Yang <richard.weiyang@gmail.com>,
	akpm@linux-foundation.org, ljs@kernel.org, ziy@nvidia.com,
	baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com,
	npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com,
	baohua@kernel.org, lance.yang@linux.dev, riel@surriel.com,
	vbabka@kernel.org, harry@kernel.org, jannh@google.com,
	rppt@kernel.org, surenb@google.com, mhocko@suse.com,
	shuah@kernel.org, linux-mm@kvack.org,
	Gavin Guo <gavinguo@igalia.com>
Subject: Re: [PATCH 1/2] mm/huge_memory: return true if split_huge_pmd_locked() split PMD to migration entry
Date: Tue, 5 May 2026 03:15:14 +0000	[thread overview]
Message-ID: <20260505031514.hlnn5o7wkad4teo2@master> (raw)
In-Reply-To: <ccb23360-d257-491e-926e-2a7f7b9c98c6@kernel.org>

On Mon, May 04, 2026 at 02:44:43PM +0200, David Hildenbrand (Arm) wrote:
>On 5/3/26 02:38, Wei Yang wrote:
>> On Wed, Apr 29, 2026 at 08:55:07AM +0200, David Hildenbrand (Arm) wrote:
>>> On 4/29/26 04:49, Wei Yang wrote:
>>>>
>>>> Below is my proposed change:
>>>>
>>>> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
>>>> index a4d52fdb3056..6e915d35ae54 100644
>>>> --- a/mm/page_vma_mapped.c
>>>> +++ b/mm/page_vma_mapped.c
>>>> @@ -273,17 +273,21 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
>>>>  
>>>>  			if (softleaf_is_device_private(entry)) {
>>>>  				pvmw->ptl = pmd_lock(mm, pvmw->pmd);
>>>> -				return true;
>>>> +				if (pmd_same(pmde, pmdp_get_lockless(pvmw->pmd)))
>>>> +					return true;
>>>
>>> As we have a softleaf entry, I assume we wouldn't expect to get any other bits
>>> (access/dirty) set until we grab the lock. Verifying
>>> softleaf_is_device_private() again would be better cleaner, though.
>>>
>> 
>> Got it.
>> 
>>> But really, I do wonder if we should just have a "goto retry" back to the "pmde
>>> = pmdp_get_lockless(pvmw->pmd);" instead?
>>>
>> 
>> Sounds reasonable. See below.
>> 
>>>
>>> And now I wonder why we don't have a check_pmd() handling in there? :/
>>>
>>> Should we check for the pfn here?
>> 
>> Thanks for pointing out. I think you are right.
>> 
>> After re-read the code, more questions come up my mind. I am afraid we need
>> more cleanup for page_vma_mapped_walk().
>> 
>> Below is my finding based on current understanding:
>> 
>> 1. thp_migration_supported() seems not necessary
>> 
>>    The code reaches here means pmd_is_migration_entry() return true, which
>>    means CONFIG_ARCH_ENABLE_THP_MIGRATION is set, otherwise
>>    softleaf_from_pmd() should return softleaf_mk_none() which is not a
>>    migration softleaf.
>>    
>>    CONFIG_ARCH_ENABLE_THP_MIGRATION is set in turn means
>>    CONFIG_TRANSPARENT_HUGEPAGE is set, so thp_migration_supported() must
>>    returns true.
>> 
>> 2. if migration entry change under us, we may need to handle on pte level
>> 
>>    In pmd_is_migration_entry() -> !pmd_present() branch, we have:
>> 
>> 		if (!softleaf_is_migration(entry) ||
>> 		    !check_pmd(softleaf_to_pfn(entry), pvmw))
>> 			return not_found(pvmw);
>> 		return true;
>> 
>>    But I think we need do this:
>> 
>> 		if (softleaf_is_migration(entry)) {
>> 			if (check_pmd(softleaf_to_pfn(entry), pvmw))
>> 				return not_found(pvmw);
>> 			return true;
>> 		}
>> 
>>    Per my understanding, if the pmd_is_migration_entry() change under us, we
>>    need to handle on pte level. Just like pmd_trans_huge() case. Break the
>>    loop and return false seems not consistent.
>> 
>> 3. add proper check for device private entry
>> 
>>    For device private entry, currently we just grab lock and return. While
>>    according to the handling to pmd_trans_huge() and pmd_is_migration_entry(),
>>    we should:
>> 
>>      * re-validate it is still device private entry after pmd_lock()
>>      * check PVMW_MIGRATION
>>      * check_pmd()
>> 
>> 4. consolidate pmd entry handling
>> 
>>    Per my understanding, there are 4 cases for pmd entry handling:
>> 
>>      * pmd_trans_huge()
>>      * pmd_is_migration_entry()
>>      * pmd_is_device_private_entry()
>>      * !pmd_present()
>> 
>>    Now we handle them in a mixed state check, which complicates the logic. And
>>    the first three share similar logic. (If my above analysis is correct.)
>> 
>>      * grab pmd_lock()
>>      * re-validate pmde
>>      * check PVMW_MIGRATION
>>      * check_pmd
>> 
>>    Here I would like to take a more bold step: consolidate handling for these
>>    three cases.
>> 
>>    Below is what it would look like.
>> 
>> 	pmde = pmdp_get_lockless(pvmw->pmd);
>> 
>> 	if (pmd_trans_huge(pmde) || pmd_is_valid_softleaf(pmde)) {
>> 		unsigned long pfn;
>> 		bool is_migration;
>> 		bool for_migration;
>> 
>> 		pvmw->ptl = pmd_lock(mm, pvmw->pmd);
>> 		if (pmd_same(pmde, pmdp_get_lockless(pvmw->pmd))) {
>> 			is_migration = pmd_is_migration_entry(pmde);
>> 			for_migration = !!(pvmw->flags & PVMW_MIGRATION);
>> 
>> 			if (is_migration != for_migration)
>> 				return not_found(pvmw);
>> 
>> 			if (pmd_trans_huge(pmde))
>> 				pfn = pmd_pfn(pmde);
>> 			else
>> 				pfn = softleaf_to_pfn(softleaf_from_pmd(pmde));
>> 
>> 			if (!check_pmd(pfn, pvmw))
>> 				return not_found(pvmw);
>> 
>> 			return true;
>> 		}
>> 		/* THP pmd was split under us: handle on pte level */
>> 		spin_unlock(pvmw->ptl);
>> 		pvmw->ptl = NULL;
>> 	} else if (!pmd_present(pmde)) {
>> 		if ((pvmw->flags & PVMW_SYNC) &&
>> 		    thp_vma_suitable_order(vma, pvmw->address,
>> 					   PMD_ORDER) &&
>> 		    (pvmw->nr_pages >= HPAGE_PMD_NR))
>> 			sync_with_folio_pmd_zap(mm, pvmw->pmd);
>> 
>> 		step_forward(pvmw, PMD_SIZE);
>> 		continue;
>> 	}
>> 
>> 5. use "goto retry"
>> 
>>    As you mentioned above. Instead of "handle on pte level", go to
>>    pmdp_get_lockless() for retry. This looks more reasonable to me.
>>>
>>>> +				/* THP pmd was split under us: handle on pte level */
>>>> +				spin_unlock(pvmw->ptl);
>>>> +				pvmw->ptl = NULL;
>>>
>>>
>>>
>>>> +			} else {
>>>> +				if ((pvmw->flags & PVMW_SYNC) &&
>>>> +				    thp_vma_suitable_order(vma, pvmw->address,
>>>> +							   PMD_ORDER) &&
>>>> +				    (pvmw->nr_pages >= HPAGE_PMD_NR))
>>>> +					sync_with_folio_pmd_zap(mm, pvmw->pmd);
>>>> +
>>>> +				step_forward(pvmw, PMD_SIZE);
>>>> +				continue;
>>>>  			}
>>>> -
>>>> -			if ((pvmw->flags & PVMW_SYNC) &&
>>>> -			    thp_vma_suitable_order(vma, pvmw->address,
>>>> -						   PMD_ORDER) &&
>>>> -			    (pvmw->nr_pages >= HPAGE_PMD_NR))
>>>> -				sync_with_folio_pmd_zap(mm, pvmw->pmd);
>>>> -
>>>> -			step_forward(pvmw, PMD_SIZE);
>>>> -			continue;
>>>>  		}
>>>>  		if (!map_pte(pvmw, &pmde, &ptl)) {
>>>>  			if (!pvmw->pte)
>>>>
>>>> After this, we could simplify the logic in try_to_migrate_one() as:
>>>>
>>>> @@ -2471,14 +2471,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
>>>>                                  * so we can detect this scenario and properly
>>>>                                  * abort the walk.
>>>>                                  */
>>>> -                               if (split_huge_pmd_locked(vma, pvmw.address,
>>>> -                                                     pvmw.pmd, true)) {
>>>> -                                       page_vma_mapped_walk_done(&pvmw);
>>>> -                                       break;
>>>> -                               }
>>>> -                               flags &= ~TTU_SPLIT_HUGE_PMD;
>>>> -                               page_vma_mapped_walk_restart(&pvmw);
>>>> -                               continue;
>>>> +                               ret = split_huge_pmd_locked(vma, pvmw.address,
>>>> +                                                     pvmw.pmd, true);
>>>> +                               page_vma_mapped_walk_done(&pvmw);
>>>> +                               break;
>>>>                         }
>>>
>>> Right. But just to be clear: Let's split the page_vma_mapped_walk() validation
>>> (which looks like a bugfix to me) from the other optimization.
>>>
>> 
>> Sure, maybe we can split the page_vma_mapped_walk() cleanup out to another
>> patch set for better reviewing?
>
>Yes, but I assume it could even be fixes?

Agree.

For those above proposed changes, #2 and #3 is proper to be fixes.

 2. if migration entry change under us, we may need to handle on pte level
 3. add proper check for device private entry

The corresponding commit to fix are

   commit 616b8371539a6c487404c3b8fb04078016dab4ba
   Author: Zi Yan <zi.yan@cs.rutgers.edu>
   Date:   Fri Sep 8 16:10:57 2017 -0700
   
       mm: thp: enable thp migration in generic path

   commit 65edfda6f3f2e58f757485a056e4f1775a1404a8
   Author: Balbir Singh <balbirs@nvidia.com>
   Date:   Wed Oct 1 16:56:55 2025 +1000

       mm/rmap: extend rmap and migration support device-private entries

As Andrew suggested, I would send fixes and cleanup separately.

After all these settled down, I will respin this thread.

-- 
Wei Yang
Help you, Help me

next prev parent reply	other threads:[~2026-05-05  3:15 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-15  1:08 [PATCH 0/2] mm/huge_memory: optimize migration when huge PMD needs split Wei Yang
2026-04-15  1:08 ` [PATCH 1/2] mm/huge_memory: return true if split_huge_pmd_locked() split PMD to migration entry Wei Yang
2026-04-24 19:29   ` David Hildenbrand (Arm)
2026-04-26  9:19     ` Wei Yang
2026-04-28  8:24       ` David Hildenbrand (Arm)
2026-04-29  2:49         ` Wei Yang
2026-04-29  6:55           ` David Hildenbrand (Arm)
2026-05-03  0:38             ` Wei Yang
2026-05-04 12:44               ` David Hildenbrand (Arm)
2026-05-05  3:15                 ` Wei Yang [this message]
2026-04-15  1:08 ` [PATCH 2/2] mm/selftests: add split_shared_pmd() Wei Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260505031514.hlnn5o7wkad4teo2@master \
    --to=richard.weiyang@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=gavinguo@igalia.com \
    --cc=harry@kernel.org \
    --cc=jannh@google.com \
    --cc=lance.yang@linux.dev \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=npache@redhat.com \
    --cc=riel@surriel.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=shuah@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox