From: Lance Yang <lance.yang@linux.dev>
To: david@kernel.org
Cc: richard.weiyang@gmail.com, akpm@linux-foundation.org,
ljs@kernel.org, riel@surriel.com, liam@infradead.org,
vbabka@kernel.org, harry@kernel.org, jannh@google.com,
ziy@nvidia.com, sj@kernel.org, balbirs@nvidia.com,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
stable@vger.kernel.org, lance.yang@linux.dev
Subject: Re: [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling
Date: Fri, 26 Jun 2026 21:27:28 +0800 [thread overview]
Message-ID: <20260626132728.77436-1-lance.yang@linux.dev> (raw)
In-Reply-To: <d060cadd-34f8-42da-b7f7-c8d295050436@kernel.org>
On Fri, Jun 26, 2026 at 12:07:56PM +0200, David Hildenbrand (Arm) wrote:
>On 6/24/26 08:53, Wei Yang wrote:
>> Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
>> device-private entries") introduced the concept of device-private
>> PMD entries, but did not correctly update the rmap walk code to
>> account for them.
>>
>> As a result, when page_vma_mapped_walk() encounters device-private
>> PMD entries, it takes no action other than to acquire the PMD lock
>> and exit.
>>
>> However this is highly problematic for two reasons - firstly,
>> device private entries possess a PFN so check_pmd() needs to be
>> called to ensure an overlapping PFN range.
>>
>> Secondly, and more importantly, if PVMW_MIGRATION is set the
>> caller assumes the returned entry is a migration entry, resulting
>> in memory corruption when the caller tries to interpret the device
>> private entry as such.
>>
>> In addition, commit 146287290023 ("mm/huge_memory: implement
>> device-private THP splitting") allowed device private PMDs to be
>> split like THP mappings, but again did not update this code path.
>>
>> As a result, we might race a PMD split prior to acquiring the PMD
>> lock.
>>
>> This patch addresses all of these issues by invoking check_pmd(),
>> ensuring PMVW_MIGRATION is not set and checks whether a split raced
>> us we do for PMD THP and migration entries.
>>
>> Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
>> Cc: <stable@vger.kernel.org>
>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>> Suggested-by: David Hildenbrand <david@kernel.org>
>> Cc: David Hildenbrand <david@kernel.org>
>> Cc: Balbir Singh <balbirs@nvidia.com>
>> Cc: SeongJae Park <sj@kernel.org>
>> Cc: Zi Yan <ziy@nvidia.com>
>> Cc: Lorenzo Stoakes <ljs@kernel.org>
>> Cc: Lance Yang <lance.yang@linux.dev>
>>
>> ---
>> v4:
>> * refine subject and commit log based on Lorenzo's suggestion
>> * put pmd device-private entry handling in its own if branch,
>> suggested by Lorenzo
>>
>> v3:
>> * remove cleanup part, only fix the issue for device-private entry
>> * refine user effect description based on Lorenzo's suggestion
>>
>> v2: https://lore.kernel.org/all/20260616063436.20455-1-richard.weiyang@gmail.com/T/#u
>> * specify the possible error case of current code and user visible effect
>> * besides fix, cleanup the pmd entry handling based on David's suggestion
>>
>> v1: https://lore.kernel.org/linux-mm/20260508013728.21285-1-richard.weiyang@gmail.com/
>> ---
>> mm/page_vma_mapped.c | 20 +++++++++++++++-----
>> 1 file changed, 15 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
>> index 2ccbabfb2cc1..17dff8aab9f9 100644
>> --- a/mm/page_vma_mapped.c
>> +++ b/mm/page_vma_mapped.c
>> @@ -269,14 +269,24 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
>> /* THP pmd was split under us: handle on pte level */
>> spin_unlock(pvmw->ptl);
>> pvmw->ptl = NULL;
>> - } else if (!pmd_present(pmde)) {
>> - const softleaf_t entry = softleaf_from_pmd(pmde);
>> + } else if (pmd_is_device_private_entry(pmde)) {
>> + softleaf_t entry;
>> +
>> + pvmw->ptl = pmd_lock(mm, pvmw->pmd);
>> + pmde = *pvmw->pmd;
>> + entry = softleaf_from_pmd(pmde);
>>
>> - if (softleaf_is_device_private(entry)) {
>> - pvmw->ptl = pmd_lock(mm, pvmw->pmd);
>> + if (likely(softleaf_is_device_private(entry))) {
>> + if (pvmw->flags & PVMW_MIGRATION)
>> + return not_found(pvmw);
>> + if (!check_pmd(softleaf_to_pfn(entry), pvmw))
>> + return not_found(pvmw);
>> return true;
>> }
>> -
>> + /* device-private pmd was split under us: handle on pte level */
>> + spin_unlock(pvmw->ptl);
>> + pvmw->ptl = NULL;
>> + } else if (!pmd_present(pmde)) {
>> if ((pvmw->flags & PVMW_SYNC) &&
>> thp_vma_suitable_order(vma, pvmw->address,
>> PMD_ORDER) &&
>
>This is extremely hard to review given the existing crap handling here. I'm
>really sorry, but it makes my head hurt (I'm not kidding :) ).
>
>It's completely unclear why we only have to check for a subset of the cases
>after taking the lock.
>
>Could we simply extend the existing migration pmd handling and leave the
>!pmd_present() case for pmd_none()?
>
>That leaves no question to "which transitions are actually allowed", including
>"could we accidentally assume something is a page table when really it isn't".
>
>
>So what about something like the following?
>
>The "thp_migration_supported()" is not required when checking for
>pmd_is_migration_entry(), as that defaults to "false" when not compiled in.
>
>Untested:
>
>
>>From 048ecd33673ec649e168fbbb97749a7c0e344fcd Mon Sep 17 00:00:00 2001
>From: "David Hildenbrand (Arm)" <david@kernel.org>
>Date: Fri, 26 Jun 2026 12:03:40 +0200
>Subject: [PATCH] tmp
>
>Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
>---
> mm/page_vma_mapped.c | 29 +++++++++++++++++------------
> 1 file changed, 17 insertions(+), 12 deletions(-)
>
>diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
>index 2ccbabfb2cc17..ed2a23a90e8dd 100644
>--- a/mm/page_vma_mapped.c
>+++ b/mm/page_vma_mapped.c
>@@ -243,21 +243,31 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> */
> pmde = pmdp_get_lockless(pvmw->pmd);
>
>- if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde)) {
>+ if (pmd_trans_huge(pmde) || pmd_is_migration_entry(pmde) ||
>+ pmd_is_device_private_entry(pmde)) {
> pvmw->ptl = pmd_lock(mm, pvmw->pmd);
> pmde = *pvmw->pmd;
>- if (!pmd_present(pmde)) {
>+ if (pmd_is_migration_entry(pmde)) {
> softleaf_t entry;
>
>- if (!thp_migration_supported() ||
>- !(pvmw->flags & PVMW_MIGRATION))
>+ if (!(pvmw->flags & PVMW_MIGRATION))
> return not_found(pvmw);
> entry = softleaf_from_pmd(pmde);
>+ if (!check_pmd(softleaf_to_pfn(entry), pvmw))
>+ return not_found(pvmw);
>+ return true;
>+ } else if (pmd_is_device_private_entry(pmde)) {
>+ softleaf_t entry;
>
>- if (!softleaf_is_migration(entry) ||
>- !check_pmd(softleaf_to_pfn(entry), pvmw))
>+ if (pvmw->flags & PVMW_MIGRATION)
>+ return not_found(pvmw);
>+ entry = softleaf_from_pmd(pmde);
>+ if (!check_pmd(softleaf_to_pfn(entry), pvmw))
> return not_found(pvmw);
> return true;
>+ } else if (!pmd_present(pmde) ){
>+ return not_found(pvmw);
> }
> if (likely(pmd_trans_huge(pmde))) {
> if (pvmw->flags & PVMW_MIGRATION)
>@@ -270,12 +280,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> spin_unlock(pvmw->ptl);
> pvmw->ptl = NULL;
> } else if (!pmd_present(pmde)) {
>- const softleaf_t entry = softleaf_from_pmd(pmde);
>-
>- if (softleaf_is_device_private(entry)) {
>- pvmw->ptl = pmd_lock(mm, pvmw->pmd);
>- return true;
>- }
>
> if ((pvmw->flags & PVMW_SYNC) &&
> thp_vma_suitable_order(vma, pvmw->address,
>--
Might be good with this on top:
---8<---
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index cfa1230c87bb..8b7c062bd81d 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -281,7 +281,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
return not_found(pvmw);
return true;
}
- /* THP pmd was split under us: handle on pte level */
+ /* THP/device-private pmd was split under us: handle on pte level */
spin_unlock(pvmw->ptl);
pvmw->ptl = NULL;
} else if (!pmd_present(pmde)) {
--
Looks good to me as well, thanks!
next prev parent reply other threads:[~2026-06-26 13:27 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-24 6:53 [Patch mm-hotfixes v4] mm/page_vma_mapped: fix device-private PMD handling Wei Yang
2026-06-24 8:57 ` Lance Yang
2026-06-25 9:57 ` Wei Yang
2026-06-25 10:37 ` David Hildenbrand (Arm)
2026-06-25 11:25 ` Lance Yang
2026-06-25 11:42 ` Lance Yang
2026-06-25 21:07 ` Andrew Morton
2026-06-25 13:12 ` Lorenzo Stoakes
2026-06-25 11:12 ` Balbir Singh
2026-06-26 0:44 ` Wei Yang
2026-06-26 0:58 ` Andrew Morton
2026-06-25 19:39 ` Zi Yan
2026-06-26 10:07 ` David Hildenbrand (Arm)
2026-06-26 10:42 ` Lorenzo Stoakes
2026-06-26 11:31 ` David Hildenbrand (Arm)
2026-06-26 13:24 ` Zi Yan
2026-06-26 13:32 ` Lorenzo Stoakes
2026-06-26 13:27 ` Lance Yang [this message]
2026-06-26 13:51 ` David Hildenbrand (Arm)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260626132728.77436-1-lance.yang@linux.dev \
--to=lance.yang@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=balbirs@nvidia.com \
--cc=david@kernel.org \
--cc=harry@kernel.org \
--cc=jannh@google.com \
--cc=liam@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=richard.weiyang@gmail.com \
--cc=riel@surriel.com \
--cc=sj@kernel.org \
--cc=stable@vger.kernel.org \
--cc=vbabka@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.