public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
From: Wei Yang <richard.weiyang@gmail.com>
To: "David Hildenbrand (Arm)" <david@kernel.org>
Cc: Wei Yang <richard.weiyang@gmail.com>,
	akpm@linux-foundation.org, ljs@kernel.org, ziy@nvidia.com,
	baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com,
	npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com,
	baohua@kernel.org, lance.yang@linux.dev, riel@surriel.com,
	vbabka@kernel.org, harry@kernel.org, jannh@google.com,
	rppt@kernel.org, surenb@google.com, mhocko@suse.com,
	shuah@kernel.org, linux-mm@kvack.org,
	Gavin Guo <gavinguo@igalia.com>
Subject: Re: [PATCH 1/2] mm/huge_memory: return true if split_huge_pmd_locked() split PMD to migration entry
Date: Sun, 3 May 2026 00:38:18 +0000	[thread overview]
Message-ID: <20260503003818.t35q5roc7osx6se2@master> (raw)
In-Reply-To: <413feed4-6aab-43d9-b7e5-a9386fa79f4b@kernel.org>

On Wed, Apr 29, 2026 at 08:55:07AM +0200, David Hildenbrand (Arm) wrote:
>On 4/29/26 04:49, Wei Yang wrote:
>> On Tue, Apr 28, 2026 at 10:24:42AM +0200, David Hildenbrand (Arm) wrote:
>>> On 4/26/26 11:19, Wei Yang wrote:
>>>>
>>>> Will adjust related places.
>>>>
>>>>
>>>> Got it.
>>>>
>>>>
>>>> Here is my understanding.
>>>>
>>>> We get here when page_vma_mapped_walk() touch a pmd entry, with three cases:
>>>>
>>>>   * pmd_trans_huge()
>>>>   * pmd_is_migration_entry()
>>>>   * pmd_is_device_private_entry()
>>>>
>>>> For the first two cases, we grab pmd_lock() and then check the condition is
>>>> still valid before return. But for case 3, after grab pmd_lock(), it return
>>>> directly.
>>>>
>>>> This may give chance for another thread to split pmd_is_device_private_entry()
>>>> to pte mapped, IIUC. For this case, we should restart the walk here.
>>>
>>>
>>> So what you are saying is that we should re-validate in page_vma_mapped_walk()
>>> that we indeed still have a device-private entry after grabbing the lock?
>>>
>>> That's what we do in map_pte() through pmd_same() check.
>>>
>>> Likely we should apply the same model here!
>>>
>> 
>> Below is my proposed change:
>> 
>> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
>> index a4d52fdb3056..6e915d35ae54 100644
>> --- a/mm/page_vma_mapped.c
>> +++ b/mm/page_vma_mapped.c
>> @@ -273,17 +273,21 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
>>  
>>  			if (softleaf_is_device_private(entry)) {
>>  				pvmw->ptl = pmd_lock(mm, pvmw->pmd);
>> -				return true;
>> +				if (pmd_same(pmde, pmdp_get_lockless(pvmw->pmd)))
>> +					return true;
>
>As we have a softleaf entry, I assume we wouldn't expect to get any other bits
>(access/dirty) set until we grab the lock. Verifying
>softleaf_is_device_private() again would be better cleaner, though.
>

Got it.

>But really, I do wonder if we should just have a "goto retry" back to the "pmde
>= pmdp_get_lockless(pvmw->pmd);" instead?
>

Sounds reasonable. See below.

>
>And now I wonder why we don't have a check_pmd() handling in there? :/
>
>Should we check for the pfn here?

Thanks for pointing out. I think you are right.

After re-read the code, more questions come up my mind. I am afraid we need
more cleanup for page_vma_mapped_walk().

Below is my finding based on current understanding:

1. thp_migration_supported() seems not necessary

   The code reaches here means pmd_is_migration_entry() return true, which
   means CONFIG_ARCH_ENABLE_THP_MIGRATION is set, otherwise
   softleaf_from_pmd() should return softleaf_mk_none() which is not a
   migration softleaf.
   
   CONFIG_ARCH_ENABLE_THP_MIGRATION is set in turn means
   CONFIG_TRANSPARENT_HUGEPAGE is set, so thp_migration_supported() must
   returns true.

2. if migration entry change under us, we may need to handle on pte level

   In pmd_is_migration_entry() -> !pmd_present() branch, we have:

		if (!softleaf_is_migration(entry) ||
		    !check_pmd(softleaf_to_pfn(entry), pvmw))
			return not_found(pvmw);
		return true;

   But I think we need do this:

		if (softleaf_is_migration(entry)) {
			if (check_pmd(softleaf_to_pfn(entry), pvmw))
				return not_found(pvmw);
			return true;
		}

   Per my understanding, if the pmd_is_migration_entry() change under us, we
   need to handle on pte level. Just like pmd_trans_huge() case. Break the
   loop and return false seems not consistent.

3. add proper check for device private entry

   For device private entry, currently we just grab lock and return. While
   according to the handling to pmd_trans_huge() and pmd_is_migration_entry(),
   we should:

     * re-validate it is still device private entry after pmd_lock()
     * check PVMW_MIGRATION
     * check_pmd()

4. consolidate pmd entry handling

   Per my understanding, there are 4 cases for pmd entry handling:

     * pmd_trans_huge()
     * pmd_is_migration_entry()
     * pmd_is_device_private_entry()
     * !pmd_present()

   Now we handle them in a mixed state check, which complicates the logic. And
   the first three share similar logic. (If my above analysis is correct.)

     * grab pmd_lock()
     * re-validate pmde
     * check PVMW_MIGRATION
     * check_pmd

   Here I would like to take a more bold step: consolidate handling for these
   three cases.

   Below is what it would look like.

	pmde = pmdp_get_lockless(pvmw->pmd);

	if (pmd_trans_huge(pmde) || pmd_is_valid_softleaf(pmde)) {
		unsigned long pfn;
		bool is_migration;
		bool for_migration;

		pvmw->ptl = pmd_lock(mm, pvmw->pmd);
		if (pmd_same(pmde, pmdp_get_lockless(pvmw->pmd))) {
			is_migration = pmd_is_migration_entry(pmde);
			for_migration = !!(pvmw->flags & PVMW_MIGRATION);

			if (is_migration != for_migration)
				return not_found(pvmw);

			if (pmd_trans_huge(pmde))
				pfn = pmd_pfn(pmde);
			else
				pfn = softleaf_to_pfn(softleaf_from_pmd(pmde));

			if (!check_pmd(pfn, pvmw))
				return not_found(pvmw);

			return true;
		}
		/* THP pmd was split under us: handle on pte level */
		spin_unlock(pvmw->ptl);
		pvmw->ptl = NULL;
	} else if (!pmd_present(pmde)) {
		if ((pvmw->flags & PVMW_SYNC) &&
		    thp_vma_suitable_order(vma, pvmw->address,
					   PMD_ORDER) &&
		    (pvmw->nr_pages >= HPAGE_PMD_NR))
			sync_with_folio_pmd_zap(mm, pvmw->pmd);

		step_forward(pvmw, PMD_SIZE);
		continue;
	}

5. use "goto retry"

   As you mentioned above. Instead of "handle on pte level", go to
   pmdp_get_lockless() for retry. This looks more reasonable to me.
>
>> +				/* THP pmd was split under us: handle on pte level */
>> +				spin_unlock(pvmw->ptl);
>> +				pvmw->ptl = NULL;
>
>
>
>> +			} else {
>> +				if ((pvmw->flags & PVMW_SYNC) &&
>> +				    thp_vma_suitable_order(vma, pvmw->address,
>> +							   PMD_ORDER) &&
>> +				    (pvmw->nr_pages >= HPAGE_PMD_NR))
>> +					sync_with_folio_pmd_zap(mm, pvmw->pmd);
>> +
>> +				step_forward(pvmw, PMD_SIZE);
>> +				continue;
>>  			}
>> -
>> -			if ((pvmw->flags & PVMW_SYNC) &&
>> -			    thp_vma_suitable_order(vma, pvmw->address,
>> -						   PMD_ORDER) &&
>> -			    (pvmw->nr_pages >= HPAGE_PMD_NR))
>> -				sync_with_folio_pmd_zap(mm, pvmw->pmd);
>> -
>> -			step_forward(pvmw, PMD_SIZE);
>> -			continue;
>>  		}
>>  		if (!map_pte(pvmw, &pmde, &ptl)) {
>>  			if (!pvmw->pte)
>> 
>> After this, we could simplify the logic in try_to_migrate_one() as:
>> 
>> @@ -2471,14 +2471,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
>>                                  * so we can detect this scenario and properly
>>                                  * abort the walk.
>>                                  */
>> -                               if (split_huge_pmd_locked(vma, pvmw.address,
>> -                                                     pvmw.pmd, true)) {
>> -                                       page_vma_mapped_walk_done(&pvmw);
>> -                                       break;
>> -                               }
>> -                               flags &= ~TTU_SPLIT_HUGE_PMD;
>> -                               page_vma_mapped_walk_restart(&pvmw);
>> -                               continue;
>> +                               ret = split_huge_pmd_locked(vma, pvmw.address,
>> +                                                     pvmw.pmd, true);
>> +                               page_vma_mapped_walk_done(&pvmw);
>> +                               break;
>>                         }
>
>Right. But just to be clear: Let's split the page_vma_mapped_walk() validation
>(which looks like a bugfix to me) from the other optimization.
>

Sure, maybe we can split the page_vma_mapped_walk() cleanup out to another
patch set for better reviewing?

>-- 
>Cheers,
>
>David

-- 
Wei Yang
Help you, Help me


  reply	other threads:[~2026-05-03  0:38 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-15  1:08 [PATCH 0/2] mm/huge_memory: optimize migration when huge PMD needs split Wei Yang
2026-04-15  1:08 ` [PATCH 1/2] mm/huge_memory: return true if split_huge_pmd_locked() split PMD to migration entry Wei Yang
2026-04-24 19:29   ` David Hildenbrand (Arm)
2026-04-26  9:19     ` Wei Yang
2026-04-28  8:24       ` David Hildenbrand (Arm)
2026-04-29  2:49         ` Wei Yang
2026-04-29  6:55           ` David Hildenbrand (Arm)
2026-05-03  0:38             ` Wei Yang [this message]
2026-05-04 12:44               ` David Hildenbrand (Arm)
2026-05-05  3:15                 ` Wei Yang
2026-04-15  1:08 ` [PATCH 2/2] mm/selftests: add split_shared_pmd() Wei Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260503003818.t35q5roc7osx6se2@master \
    --to=richard.weiyang@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=gavinguo@igalia.com \
    --cc=harry@kernel.org \
    --cc=jannh@google.com \
    --cc=lance.yang@linux.dev \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=npache@redhat.com \
    --cc=riel@surriel.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=shuah@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox