All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wei Yang <richard.weiyang@gmail.com>
To: Wei Yang <richard.weiyang@gmail.com>
Cc: Lance Yang <lance.yang@linux.dev>,
	akpm@linux-foundation.org, david@kernel.org, ljs@kernel.org,
	riel@surriel.com, liam@infradead.org, vbabka@kernel.org,
	harry@kernel.org, jannh@google.com, balbirs@nvidia.com,
	ziy@nvidia.com, sj@kernel.org, linux-mm@kvack.org,
	lorenzo.stoakes@oracle.com, stable@vger.kernel.org
Subject: Re: [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd
Date: Fri, 19 Jun 2026 02:30:25 +0000	[thread overview]
Message-ID: <20260619023025.vqx2dsitxffuuwh3@master> (raw)
In-Reply-To: <20260617081815.kq6g3rjtomudxca5@master>

On Wed, Jun 17, 2026 at 08:18:15AM +0000, Wei Yang wrote:
>On Wed, Jun 17, 2026 at 10:32:11AM +0800, Lance Yang wrote:
>>
>>On Tue, Jun 16, 2026 at 11:50:22PM +0000, Wei Yang wrote:
>>>On Tue, Jun 16, 2026 at 08:30:01PM +0800, Lance Yang wrote:
>>>>
>>>>On Tue, Jun 16, 2026 at 06:34:36AM +0000, Wei Yang wrote:
>>>>[...]
>>>>>diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
>>>>>index 2ccbabfb2cc1..21635fab209c 100644
>>>>>--- a/mm/page_vma_mapped.c
>>>>>+++ b/mm/page_vma_mapped.c
>>>>>@@ -243,40 +243,28 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
>>>>> 		 */
>>>>> 		pmde = pmdp_get_lockless(pvmw->pmd);
>>>>> 
>>>>>-		if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) {
>>>>>-			pvmw->ptl = pmd_lock(mm, pvmw->pmd);
>>>>>-			pmde = *pvmw->pmd;
>>>>>-			if (!pmd_present(pmde)) {
>>>>>-				softleaf_t entry;
>>>>>-
>>>>>-				if (!thp_migration_supported() ||
>>>>>-				    !(pvmw->flags & PVMW_MIGRATION))
>>>>>-					return not_found(pvmw);
>>>>>-				entry = softleaf_from_pmd(pmde);
>>>>>-
>>>>>-				if (!softleaf_is_migration(entry) ||
>>>>>-				    !check_pmd(softleaf_to_pfn(entry), pvmw))
>>>>>-					return not_found(pvmw);
>>>>>-				return true;
>>>>>-			}
>>>>>-			if (likely(pmd_trans_huge(pmde))) {
>>>>>-				if (pvmw->flags & PVMW_MIGRATION)
>>>>>-					return not_found(pvmw);
>>>>>-				if (!check_pmd(pmd_pfn(pmde), pvmw))
>>>>>-					return not_found(pvmw);
>>>>>-				return true;
>>>>>-			}
>>>>>-			/* THP pmd was split under us: handle on pte level */
>>>>>-			spin_unlock(pvmw->ptl);
>>>>>-			pvmw->ptl = NULL;
>>>>>-		} else if (!pmd_present(pmde)) {
>>>>>-			const softleaf_t entry = softleaf_from_pmd(pmde);
>>>>>-
>>>>>-			if (softleaf_is_device_private(entry)) {
>>>>>-				pvmw->ptl = pmd_lock(mm, pvmw->pmd);
>>>>>-				return true;
>>>>>-			}
>>>>>+		if (pmd_present(pmde)) {
>>>>>+			if (!pmd_leaf(pmde))
>>>>>+				goto pte_table;
>>>>>+			if (pvmw->flags & PVMW_MIGRATION)
>>>>>+				return not_found(pvmw);
>>>>>+			if (!check_pmd(pmd_pfn(pmde), pvmw))
>>>>>+				return not_found(pvmw);
>>>>>+		} else if (pmd_is_migration_entry(pmde)) {
>>>>>+			softleaf_t entry = softleaf_from_pmd(pmde);
>>>>>+
>>>>>+			if (!(pvmw->flags & PVMW_MIGRATION))
>>>>>+				return not_found(pvmw);
>>>>
>>>>Looked at history a bit, and I wonder if this changed something old
>>>>here ...
>>>>
>>>>Since 616b8371539a ("mm: thp: enable thp migration in generic path"), PMD
>>>>migration handling took PTL before doing PVMW_MIGRATION/PFN checks,
>>>>including not_found() cases. So lockless PMD read was just a filter ...
>>>>
>>>>With this fix, true case gets final pmd_same() check, but this
>>>>not_found() case happens before taking PTL.
>>>>
>>>>So a !PVMW_MIGRATION walker could race with someone, e.g.
>>>>remove_migration_pmd(): we make the not_found() decision from old PMD
>>>>value that still says "migration", while real *pvmw->pmd may already be
>>>>present again. We return without ever taking PTL :)
>>>>
>>>
>>>Hi, Lance
>>>
>>>Thanks for take a look.
>>>
>>>I am trying to understand the scenario you mentioned. Let's say A migrate a
>>>pmd and B want to unmap the pmd.
>>>
>>>            A                                        B
>>>
>>>  try to migrate a pmd
>>>  pmd is set to migration entry
>>>                                           unmap the pmd ...
>>>  managed to finish migration
>>>                                           ...still see migration entry,
>>>                                           so skipped and unmap fail
>>>
>>>Would this be a timing case? Even B grab the PTL, it still could see migration
>>>entry if B visit pmd before A finish migration.
>>>
>>>Maybe I miss something, look forward your insight.
>>
>>Right, seeing migration entry while migration is still ongoing is fine.
>>
>>What I meant was this ordering:
>>
>>  CPU 0: pmde = pmdp_get_lockless(...); /* migration */
>>  CPU 1: remove_migration_pmd() restores PMD to present
>>  CPU 0: returns not_found() from old pmde, without ever taking PTL and
>>         rechecking *pvmw->pmd
>>
>>So issue is not seeing migration entry itself, but making final
>>not_found() decision from stale lockless PMD value ...
>>
>>Before this patch, PMD migration case took PTL before making that
>>decision ...
>>
>
>Yes, this patch changes the decision making condition for pmd entry. Thanks
>for pointing out.
>
>Hmm... I took another look into current pte handling and find for pte entry,
>we did two phase check:
>
>  * map_pte() without ptl
>  * check_pte() with ptl
>
>While check_pte() do extra pfn range check, map_pte() doesn't.
>
>This means for pte entry, we may face the same situation as you describe: 
>make the decision before grab PTL. Till now, it looks reasonable.
>
>But one thing jumped at me, PVMW_SYNC. When this flag is specified, all check
>is done under PTL. But now for pmd entry, we don't have a chance to do so.
>
>And as the comment says in try_to_migrate_one()
>
>	/*
>	 * When racing against e.g. zap_pte_range() on another cpu,
>	 * in between its ptep_get_and_clear_full() and folio_remove_rmap_*(),
>	 * try_to_migrate() may return before folio_mapped() has become false,
>	 * if page table locking is skipped: use TTU_SYNC to wait for that.
>	 */
>
>I tracked down to commit a98a2f0c8ce1 ('mm/rmap: split migration into its own
>function'), but not getting more detail on reasoning. Not fully understand it
>yet, but it seems there is some race between migration and unmap which is
>protected by PTL?
>
>Will look into this to get more detail.
>

After going through the history, I found this:

   commit 732ed55823fc3ad998d43b86bf771887bcc5ec67
   Author: Hugh Dickins <hughd@google.com>
   Date:   Tue Jun 15 18:23:53 2021 -0700
   
       mm/thp: try_to_unmap() use TTU_SYNC for safe splitting

This one fix the race mentioned above: we expect mapcount is 0, but is not.


IIUC, if we apply the change in this patch, the affected case is
pmd_is_migration_entry(). In case someone else has cleared it but not update
mapcount yet, try_to_migrate() would return before folio_mapped() is false.

Thanks Lance for raise the question.

If above analysis is true, I haven't got a neat way to take this into
consideration.

BTW, for a fix, I am thinking to keep it simple and direct. So how about leave
the refactor as a followup cleanup?

-- 
Wei Yang
Help you, Help me


  reply	other threads:[~2026-06-19  2:30 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260616063436.20455-1-richard.weiyang@gmail.com>
2026-06-16 12:30 ` [Patch v2] mm/page_vma_mapped: revalidate and do proper check before return device-private pmd Lance Yang
2026-06-16 23:50   ` Wei Yang
2026-06-17  2:32     ` Lance Yang
2026-06-17  8:18       ` Wei Yang
2026-06-19  2:30         ` Wei Yang [this message]
2026-06-19 12:19           ` Lance Yang
2026-06-20  2:02             ` Wei Yang
     [not found]   ` <666dc40b-e37a-46eb-af55-7a81bc1643f1@kernel.org>
2026-06-17  2:11     ` Balbir Singh
2026-06-17  3:14       ` Lance Yang
2026-06-19 10:44 ` Lorenzo Stoakes
2026-06-19 10:48   ` David Hildenbrand (Arm)
2026-06-19 11:04     ` Lorenzo Stoakes
2026-06-20  2:13     ` Wei Yang
2026-06-20  2:11   ` Wei Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260619023025.vqx2dsitxffuuwh3@master \
    --to=richard.weiyang@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbirs@nvidia.com \
    --cc=david@kernel.org \
    --cc=harry@kernel.org \
    --cc=jannh@google.com \
    --cc=lance.yang@linux.dev \
    --cc=liam@infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=riel@surriel.com \
    --cc=sj@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.