* Re: [PATCH mm-hotfixes] mm/huge_memory: use correct flags for device private PMD entry
2026-06-01 8:30 [PATCH mm-hotfixes] mm/huge_memory: use correct flags for device private PMD entry Lorenzo Stoakes
@ 2026-06-01 8:55 ` Lorenzo Stoakes
2026-06-01 12:17 ` David Hildenbrand (Arm)
2026-06-01 15:50 ` Dev Jain
` (7 subsequent siblings)
8 siblings, 1 reply; 19+ messages in thread
From: Lorenzo Stoakes @ 2026-06-01 8:55 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
SeongJae Park, Balbir Singh, linux-mm, linux-kernel
On Mon, Jun 01, 2026 at 09:30:44AM +0100, Lorenzo Stoakes wrote:
> mm/huge_memory.c | 45 +++++++++++++++++++++++++++++++++------------
> 1 file changed, 33 insertions(+), 12 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index bf9b480bb3b0..79463c709c98 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -4982,7 +4982,7 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
> struct vm_area_struct *vma = pvmw->vma;
> struct mm_struct *mm = vma->vm_mm;
> unsigned long address = pvmw->address;
> - bool anon_exclusive;
> + bool anon_exclusive, present, writable, softdirty, uffd_wp;
> pmd_t pmdval;
> swp_entry_t entry;
> pmd_t pmdswp;
> @@ -4990,12 +4990,26 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
> if (!(pvmw->pmd && !pvmw->pte))
> return 0;
>
> - flush_cache_range(vma, address, address + HPAGE_PMD_SIZE);
> - if (unlikely(!pmd_present(*pvmw->pmd)))
> - pmdval = pmdp_huge_get_and_clear(vma->vm_mm, address, pvmw->pmd);
> - else
> + present = pmd_present(*pvmw->pmd);
> + if (likely(present)) {
> + flush_cache_range(vma, address, address + HPAGE_PMD_SIZE);
Oh I forgot to mention in the commit message that I moved
flush_cache_range() into the present branch, as it's not meaningful for a
softleaf (i.e. non-present) entry.
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH mm-hotfixes] mm/huge_memory: use correct flags for device private PMD entry
2026-06-01 8:55 ` Lorenzo Stoakes
@ 2026-06-01 12:17 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 19+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-01 12:17 UTC (permalink / raw)
To: Lorenzo Stoakes, Andrew Morton
Cc: Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
Dev Jain, Barry Song, Lance Yang, SeongJae Park, Balbir Singh,
linux-mm, linux-kernel
On 6/1/26 10:55, Lorenzo Stoakes wrote:
> On Mon, Jun 01, 2026 at 09:30:44AM +0100, Lorenzo Stoakes wrote:
>> mm/huge_memory.c | 45 +++++++++++++++++++++++++++++++++------------
>> 1 file changed, 33 insertions(+), 12 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index bf9b480bb3b0..79463c709c98 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -4982,7 +4982,7 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
>> struct vm_area_struct *vma = pvmw->vma;
>> struct mm_struct *mm = vma->vm_mm;
>> unsigned long address = pvmw->address;
>> - bool anon_exclusive;
>> + bool anon_exclusive, present, writable, softdirty, uffd_wp;
>> pmd_t pmdval;
>> swp_entry_t entry;
>> pmd_t pmdswp;
>> @@ -4990,12 +4990,26 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
>> if (!(pvmw->pmd && !pvmw->pte))
>> return 0;
>>
>> - flush_cache_range(vma, address, address + HPAGE_PMD_SIZE);
>> - if (unlikely(!pmd_present(*pvmw->pmd)))
>> - pmdval = pmdp_huge_get_and_clear(vma->vm_mm, address, pvmw->pmd);
>> - else
>> + present = pmd_present(*pvmw->pmd);
>> + if (likely(present)) {
>> + flush_cache_range(vma, address, address + HPAGE_PMD_SIZE);
>
> Oh I forgot to mention in the commit message that I moved
> flush_cache_range() into the present branch, as it's not meaningful for a
> softleaf (i.e. non-present) entry.
Nothing jumped at me, so LGTM
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH mm-hotfixes] mm/huge_memory: use correct flags for device private PMD entry
2026-06-01 8:30 [PATCH mm-hotfixes] mm/huge_memory: use correct flags for device private PMD entry Lorenzo Stoakes
2026-06-01 8:55 ` Lorenzo Stoakes
@ 2026-06-01 15:50 ` Dev Jain
2026-06-01 15:56 ` David Hildenbrand (Arm)
2026-06-01 16:01 ` Lorenzo Stoakes
2026-06-01 16:44 ` Dev Jain
` (6 subsequent siblings)
8 siblings, 2 replies; 19+ messages in thread
From: Dev Jain @ 2026-06-01 15:50 UTC (permalink / raw)
To: Lorenzo Stoakes, Andrew Morton
Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
Nico Pache, Ryan Roberts, Barry Song, Lance Yang, SeongJae Park,
Balbir Singh, linux-mm, linux-kernel
On 01/06/26 2:00 pm, Lorenzo Stoakes wrote:
> Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
> device-private entries") updated set_pmd_migration_entry() to use
> pmdp_huge_get_and_clear() in the softleaf case, but made no further
> adjustments to the function itself.
>
> Therefore this function continues to incorrectly use pmd_write(),
> pmd_soft_dirty() and pmd_uffd_wp() to determine whether the installed
> migration entry should be marked writable, softdirty or uffd-wp
> respectively.
>
> Whilst all are incorrect, the most problematic of these is pmd_write(), as
> this can lead to corrupted rmap state.
>
> On x86-64 _PAGE_SWP_SOFT_DIRTY is aliased to _PAGE_RW. So calling
> pmd_write() on a softleaf will return the softdirty state encoded in the
> entry, assuming CONFIG_MEM_SOFT_DIRTY was enabled.
>
> This was observed when running the hmm.hmm_device_private.anon_write_child
> selftest:
>
> 1. The test faults in a range then migrates it such that a device-private
> THP range is established.
>
> 2. The parent then migrates it to a device-private writable PMD entry whose
> folio is entirely AnonExclusive with entire_mapcount=1, softdirty set
> (accidentally correct write state).
>
> 3. The parent forks and the PMD entries are set to device-private read only
> entries, entire_mapcount=2, softdirty still set.
>
> 4. [BUG] The child writes to the range then migrates to RAM - intending to
> install non-writable migration entries - but replacing parent and child
> PMD mappings with WRITABLE entries due to misinterpreting the softdirty
> bit.
>
> 5. In remove_migration_pmd(), if !softleaf_is_migration_read(entry) we
> set the RMAP_EXCLUSIVE flag when calling folio_add_anon_rmap_pmd() for
> both parent and child, which are therefore AnonExclusive.
>
> 6. [SPLAT] Child sets migrated folio entire_mapcount=1, parent sets
> entire_mapcount=2 and we end up with an AnonExclusive folio with
> entire_mapcount=2! Assert fires in __folio_add_anon_rmap():
>
> VM_WARN_ON_FOLIO(folio_test_large(folio) &&
> folio_entire_mapcount(folio) > 1 &&
> PageAnonExclusive(cur_page), folio)
>
> This patch fixes the issue by correctly referencing the softleaf entry
> fields for writable, softdirty and uffd-wp in set_pmd_migration_entry().
>
> It also only updates A/D flags if the entry is present as these are
> otherwise not meaningful for a softleaf entry.
>
> This patch also flips the if (!present) { ... } else { ... } logic in
> set_pmd_migration_entry() so it is easier to understand, and adds some
> comments to make things clearer.
>
> I was able to bisect this to commit 775465fd26a3 ("lib/test_hmm: add zone
> device private THP test infrastructure") which first exposes this bug as it
> was the commit that permitted test_hmm to generate the test.
>
> However commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
> device-private entries") is the commit that actually enabled this
> behaviour.
>
> Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
> Cc: stable@vger.kernel.org
> Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
> ---
Sashiko continues to find existing problems :) What do you think:
https://sashiko.dev/#/patchset/20260601083044.57132-1-ljs%40kernel.org
> mm/huge_memory.c | 45 +++++++++++++++++++++++++++++++++------------
> 1 file changed, 33 insertions(+), 12 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index bf9b480bb3b0..79463c709c98 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -4982,7 +4982,7 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
> struct vm_area_struct *vma = pvmw->vma;
> struct mm_struct *mm = vma->vm_mm;
> unsigned long address = pvmw->address;
> - bool anon_exclusive;
> + bool anon_exclusive, present, writable, softdirty, uffd_wp;
> pmd_t pmdval;
> swp_entry_t entry;
> pmd_t pmdswp;
> @@ -4990,12 +4990,26 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
> if (!(pvmw->pmd && !pvmw->pte))
> return 0;
>
> - flush_cache_range(vma, address, address + HPAGE_PMD_SIZE);
> - if (unlikely(!pmd_present(*pvmw->pmd)))
> - pmdval = pmdp_huge_get_and_clear(vma->vm_mm, address, pvmw->pmd);
> - else
> + present = pmd_present(*pvmw->pmd);
> + if (likely(present)) {
> + flush_cache_range(vma, address, address + HPAGE_PMD_SIZE);
> +
> pmdval = pmdp_invalidate(vma, address, pvmw->pmd);
>
> + writable = pmd_write(pmdval);
> + softdirty = pmd_soft_dirty(pmdval);
> + uffd_wp = pmd_uffd_wp(pmdval);
> + } else {
> + softleaf_t old_entry;
> +
> + pmdval = pmdp_huge_get_and_clear(vma->vm_mm, address, pvmw->pmd);
> + old_entry = softleaf_from_pmd(pmdval);
> +
> + writable = softleaf_is_device_private_write(old_entry);
> + softdirty = pmd_swp_soft_dirty(pmdval);
> + uffd_wp = pmd_swp_uffd_wp(pmdval);
> + }
> +
> /* See folio_try_share_anon_rmap_pmd(): invalidate PMD first. */
> anon_exclusive = folio_test_anon(folio) && PageAnonExclusive(page);
> if (anon_exclusive && folio_try_share_anon_rmap_pmd(folio, page)) {
> @@ -5003,24 +5017,31 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
> return -EBUSY;
> }
>
> - if (pmd_dirty(pmdval))
> - folio_mark_dirty(folio);
> - if (pmd_write(pmdval))
> + /* Determine type of migration entry. */
> + if (writable)
> entry = make_writable_migration_entry(page_to_pfn(page));
> else if (anon_exclusive)
> entry = make_readable_exclusive_migration_entry(page_to_pfn(page));
> else
> entry = make_readable_migration_entry(page_to_pfn(page));
> - if (pmd_young(pmdval))
> +
> + /* Set A/D bits as necessary. */
> + if (present && pmd_young(pmdval))
> entry = make_migration_entry_young(entry);
> - if (pmd_dirty(pmdval))
> + if (present && pmd_dirty(pmdval)) {
> + folio_mark_dirty(folio);
> entry = make_migration_entry_dirty(entry);
> + }
> +
> + /* Set PMD. */
> pmdswp = swp_entry_to_pmd(entry);
> - if (pmd_soft_dirty(pmdval))
> + if (softdirty)
> pmdswp = pmd_swp_mksoft_dirty(pmdswp);
> - if (pmd_uffd_wp(pmdval))
> + if (uffd_wp)
> pmdswp = pmd_swp_mkuffd_wp(pmdswp);
> set_pmd_at(mm, address, pvmw->pmd, pmdswp);
> +
> + /* Migration entry installed: cleanup rmap, folio. */
> folio_remove_rmap_pmd(folio, page, vma);
> folio_put(folio);
> trace_set_migration_pmd(address, pmd_val(pmdswp));
> --
> 2.54.0
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH mm-hotfixes] mm/huge_memory: use correct flags for device private PMD entry
2026-06-01 15:50 ` Dev Jain
@ 2026-06-01 15:56 ` David Hildenbrand (Arm)
2026-06-01 16:03 ` Lorenzo Stoakes
2026-06-01 16:01 ` Lorenzo Stoakes
1 sibling, 1 reply; 19+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-01 15:56 UTC (permalink / raw)
To: Dev Jain, Lorenzo Stoakes, Andrew Morton
Cc: Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
Barry Song, Lance Yang, SeongJae Park, Balbir Singh, linux-mm,
linux-kernel, Wei Yang
On 6/1/26 17:50, Dev Jain wrote:
>
>
> On 01/06/26 2:00 pm, Lorenzo Stoakes wrote:
>> Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
>> device-private entries") updated set_pmd_migration_entry() to use
>> pmdp_huge_get_and_clear() in the softleaf case, but made no further
>> adjustments to the function itself.
>>
>> Therefore this function continues to incorrectly use pmd_write(),
>> pmd_soft_dirty() and pmd_uffd_wp() to determine whether the installed
>> migration entry should be marked writable, softdirty or uffd-wp
>> respectively.
>>
>> Whilst all are incorrect, the most problematic of these is pmd_write(), as
>> this can lead to corrupted rmap state.
>>
>> On x86-64 _PAGE_SWP_SOFT_DIRTY is aliased to _PAGE_RW. So calling
>> pmd_write() on a softleaf will return the softdirty state encoded in the
>> entry, assuming CONFIG_MEM_SOFT_DIRTY was enabled.
>>
>> This was observed when running the hmm.hmm_device_private.anon_write_child
>> selftest:
>>
>> 1. The test faults in a range then migrates it such that a device-private
>> THP range is established.
>>
>> 2. The parent then migrates it to a device-private writable PMD entry whose
>> folio is entirely AnonExclusive with entire_mapcount=1, softdirty set
>> (accidentally correct write state).
>>
>> 3. The parent forks and the PMD entries are set to device-private read only
>> entries, entire_mapcount=2, softdirty still set.
>>
>> 4. [BUG] The child writes to the range then migrates to RAM - intending to
>> install non-writable migration entries - but replacing parent and child
>> PMD mappings with WRITABLE entries due to misinterpreting the softdirty
>> bit.
>>
>> 5. In remove_migration_pmd(), if !softleaf_is_migration_read(entry) we
>> set the RMAP_EXCLUSIVE flag when calling folio_add_anon_rmap_pmd() for
>> both parent and child, which are therefore AnonExclusive.
>>
>> 6. [SPLAT] Child sets migrated folio entire_mapcount=1, parent sets
>> entire_mapcount=2 and we end up with an AnonExclusive folio with
>> entire_mapcount=2! Assert fires in __folio_add_anon_rmap():
>>
>> VM_WARN_ON_FOLIO(folio_test_large(folio) &&
>> folio_entire_mapcount(folio) > 1 &&
>> PageAnonExclusive(cur_page), folio)
>>
>> This patch fixes the issue by correctly referencing the softleaf entry
>> fields for writable, softdirty and uffd-wp in set_pmd_migration_entry().
>>
>> It also only updates A/D flags if the entry is present as these are
>> otherwise not meaningful for a softleaf entry.
>>
>> This patch also flips the if (!present) { ... } else { ... } logic in
>> set_pmd_migration_entry() so it is easier to understand, and adds some
>> comments to make things clearer.
>>
>> I was able to bisect this to commit 775465fd26a3 ("lib/test_hmm: add zone
>> device private THP test infrastructure") which first exposes this bug as it
>> was the commit that permitted test_hmm to generate the test.
>>
>> However commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
>> device-private entries") is the commit that actually enabled this
>> behaviour.
>>
>> Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
>> ---
>
> Sashiko continues to find existing problems :) What do you think:
>
> https://sashiko.dev/#/patchset/20260601083044.57132-1-ljs%40kernel.org
IIRC, Wei is already working on fixing what it reports here regarding
try_to_migrate_one(). So it's old news.
https://lore.kernel.org/r/20260508013728.21285-1-richard.weiyang@gmail.com
--
Cheers,
David
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH mm-hotfixes] mm/huge_memory: use correct flags for device private PMD entry
2026-06-01 15:56 ` David Hildenbrand (Arm)
@ 2026-06-01 16:03 ` Lorenzo Stoakes
0 siblings, 0 replies; 19+ messages in thread
From: Lorenzo Stoakes @ 2026-06-01 16:03 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: Dev Jain, Andrew Morton, Zi Yan, Baolin Wang, Liam R . Howlett,
Nico Pache, Ryan Roberts, Barry Song, Lance Yang, SeongJae Park,
Balbir Singh, linux-mm, linux-kernel, Wei Yang
On Mon, Jun 01, 2026 at 05:56:45PM +0200, David Hildenbrand (Arm) wrote:
> On 6/1/26 17:50, Dev Jain wrote:
> > Sashiko continues to find existing problems :) What do you think:
> >
> > https://sashiko.dev/#/patchset/20260601083044.57132-1-ljs%40kernel.org
>
> IIRC, Wei is already working on fixing what it reports here regarding
> try_to_migrate_one(). So it's old news.
>
> https://lore.kernel.org/r/20260508013728.21285-1-richard.weiyang@gmail.com
Thanks, wasn't aware!
But it's also irrelevant, 'existing broken thing' has ZERO to do with a patch
addressing something else.
I kinda wish Sashiko didn't do it.
>
> --
> Cheers,
>
> David
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH mm-hotfixes] mm/huge_memory: use correct flags for device private PMD entry
2026-06-01 15:50 ` Dev Jain
2026-06-01 15:56 ` David Hildenbrand (Arm)
@ 2026-06-01 16:01 ` Lorenzo Stoakes
2026-06-01 16:27 ` Dev Jain
1 sibling, 1 reply; 19+ messages in thread
From: Lorenzo Stoakes @ 2026-06-01 16:01 UTC (permalink / raw)
To: Dev Jain
Cc: Andrew Morton, David Hildenbrand, Zi Yan, Baolin Wang,
Liam R . Howlett, Nico Pache, Ryan Roberts, Barry Song,
Lance Yang, SeongJae Park, Balbir Singh, linux-mm, linux-kernel
On Mon, Jun 01, 2026 at 09:20:51PM +0530, Dev Jain wrote:
> Sashiko continues to find existing problems :) What do you think:
>
> https://sashiko.dev/#/patchset/20260601083044.57132-1-ljs%40kernel.org
Thanks for highlighting Dev and to be clear I'm not yelling at you :P I'm
yelling about this aspect of sashiko... :)
So, this patch fixes a serious issue that renders THP device-private
completely broken in a way that can lead to memory corruption, and this
review comment has nothing to do with that :)
So TL;DR - maybe as a follow up?
IMO there's _no_ obligation to respond to stuff like this, in the same
way as somebody in a review saying 'hey here's this unrelated broken
thing'.
And this kind of thing should _never_ _ever_ block a series or patch.
(I kinda wish sashiko didn't do it, I have extremely limited time as it is,
it'd be better as a passive background scan of existing issues or
something.)
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH mm-hotfixes] mm/huge_memory: use correct flags for device private PMD entry
2026-06-01 16:01 ` Lorenzo Stoakes
@ 2026-06-01 16:27 ` Dev Jain
0 siblings, 0 replies; 19+ messages in thread
From: Dev Jain @ 2026-06-01 16:27 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Andrew Morton, David Hildenbrand, Zi Yan, Baolin Wang,
Liam R . Howlett, Nico Pache, Ryan Roberts, Barry Song,
Lance Yang, SeongJae Park, Balbir Singh, linux-mm, linux-kernel
On 01/06/26 9:31 pm, Lorenzo Stoakes wrote:
> On Mon, Jun 01, 2026 at 09:20:51PM +0530, Dev Jain wrote:
>> Sashiko continues to find existing problems :) What do you think:
>>
>> https://sashiko.dev/#/patchset/20260601083044.57132-1-ljs%40kernel.org
>
> Thanks for highlighting Dev and to be clear I'm not yelling at you :P I'm
> yelling about this aspect of sashiko... :)
>
> So, this patch fixes a serious issue that renders THP device-private
> completely broken in a way that can lead to memory corruption, and this
> review comment has nothing to do with that :)
>
> So TL;DR - maybe as a follow up?
>
> IMO there's _no_ obligation to respond to stuff like this, in the same
> way as somebody in a review saying 'hey here's this unrelated broken
> thing'.
>
> And this kind of thing should _never_ _ever_ block a series or patch.
Yep I agree.
Thanks David for bringing up the link.
>
> (I kinda wish sashiko didn't do it, I have extremely limited time as it is,
> it'd be better as a passive background scan of existing issues or
> something.)
>
> Cheers, Lorenzo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH mm-hotfixes] mm/huge_memory: use correct flags for device private PMD entry
2026-06-01 8:30 [PATCH mm-hotfixes] mm/huge_memory: use correct flags for device private PMD entry Lorenzo Stoakes
2026-06-01 8:55 ` Lorenzo Stoakes
2026-06-01 15:50 ` Dev Jain
@ 2026-06-01 16:44 ` Dev Jain
2026-06-05 10:07 ` Lorenzo Stoakes
2026-06-01 20:30 ` Balbir Singh
` (5 subsequent siblings)
8 siblings, 1 reply; 19+ messages in thread
From: Dev Jain @ 2026-06-01 16:44 UTC (permalink / raw)
To: Lorenzo Stoakes, Andrew Morton
Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
Nico Pache, Ryan Roberts, Barry Song, Lance Yang, SeongJae Park,
Balbir Singh, linux-mm, linux-kernel
On 01/06/26 2:00 pm, Lorenzo Stoakes wrote:
> Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
> device-private entries") updated set_pmd_migration_entry() to use
> pmdp_huge_get_and_clear() in the softleaf case, but made no further
> adjustments to the function itself.
>
> Therefore this function continues to incorrectly use pmd_write(),
> pmd_soft_dirty() and pmd_uffd_wp() to determine whether the installed
> migration entry should be marked writable, softdirty or uffd-wp
> respectively.
>
> Whilst all are incorrect, the most problematic of these is pmd_write(), as
> this can lead to corrupted rmap state.
>
> On x86-64 _PAGE_SWP_SOFT_DIRTY is aliased to _PAGE_RW. So calling
> pmd_write() on a softleaf will return the softdirty state encoded in the
> entry, assuming CONFIG_MEM_SOFT_DIRTY was enabled.
>
> This was observed when running the hmm.hmm_device_private.anon_write_child
> selftest:
>
> 1. The test faults in a range then migrates it such that a device-private
> THP range is established.
>
> 2. The parent then migrates it to a device-private writable PMD entry whose
> folio is entirely AnonExclusive with entire_mapcount=1, softdirty set
> (accidentally correct write state).
>
> 3. The parent forks and the PMD entries are set to device-private read only
> entries, entire_mapcount=2, softdirty still set.
>
> 4. [BUG] The child writes to the range then migrates to RAM - intending to
> install non-writable migration entries - but replacing parent and child
> PMD mappings with WRITABLE entries due to misinterpreting the softdirty
> bit.
>
> 5. In remove_migration_pmd(), if !softleaf_is_migration_read(entry) we
> set the RMAP_EXCLUSIVE flag when calling folio_add_anon_rmap_pmd() for
> both parent and child, which are therefore AnonExclusive.
>
> 6. [SPLAT] Child sets migrated folio entire_mapcount=1, parent sets
> entire_mapcount=2 and we end up with an AnonExclusive folio with
> entire_mapcount=2! Assert fires in __folio_add_anon_rmap():
>
> VM_WARN_ON_FOLIO(folio_test_large(folio) &&
> folio_entire_mapcount(folio) > 1 &&
> PageAnonExclusive(cur_page), folio)
>
> This patch fixes the issue by correctly referencing the softleaf entry
> fields for writable, softdirty and uffd-wp in set_pmd_migration_entry().
>
> It also only updates A/D flags if the entry is present as these are
> otherwise not meaningful for a softleaf entry.
>
> This patch also flips the if (!present) { ... } else { ... } logic in
> set_pmd_migration_entry() so it is easier to understand, and adds some
> comments to make things clearer.
>
> I was able to bisect this to commit 775465fd26a3 ("lib/test_hmm: add zone
> device private THP test infrastructure") which first exposes this bug as it
> was the commit that permitted test_hmm to generate the test.
>
> However commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
> device-private entries") is the commit that actually enabled this
> behaviour.
>
> Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
> Cc: stable@vger.kernel.org
> Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
> ---
Had a look again and can't find anything so:
Reviewed-by: Dev Jain <dev.jain@arm.com>
> mm/huge_memory.c | 45 +++++++++++++++++++++++++++++++++------------
> 1 file changed, 33 insertions(+), 12 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index bf9b480bb3b0..79463c709c98 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -4982,7 +4982,7 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
> struct vm_area_struct *vma = pvmw->vma;
> struct mm_struct *mm = vma->vm_mm;
> unsigned long address = pvmw->address;
> - bool anon_exclusive;
> + bool anon_exclusive, present, writable, softdirty, uffd_wp;
> pmd_t pmdval;
> swp_entry_t entry;
> pmd_t pmdswp;
> @@ -4990,12 +4990,26 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
> if (!(pvmw->pmd && !pvmw->pte))
> return 0;
>
> - flush_cache_range(vma, address, address + HPAGE_PMD_SIZE);
> - if (unlikely(!pmd_present(*pvmw->pmd)))
> - pmdval = pmdp_huge_get_and_clear(vma->vm_mm, address, pvmw->pmd);
> - else
> + present = pmd_present(*pvmw->pmd);
> + if (likely(present)) {
> + flush_cache_range(vma, address, address + HPAGE_PMD_SIZE);
> +
> pmdval = pmdp_invalidate(vma, address, pvmw->pmd);
>
> + writable = pmd_write(pmdval);
> + softdirty = pmd_soft_dirty(pmdval);
> + uffd_wp = pmd_uffd_wp(pmdval);
> + } else {
> + softleaf_t old_entry;
> +
> + pmdval = pmdp_huge_get_and_clear(vma->vm_mm, address, pvmw->pmd);
> + old_entry = softleaf_from_pmd(pmdval);
> +
> + writable = softleaf_is_device_private_write(old_entry);
> + softdirty = pmd_swp_soft_dirty(pmdval);
> + uffd_wp = pmd_swp_uffd_wp(pmdval);
> + }
> +
> /* See folio_try_share_anon_rmap_pmd(): invalidate PMD first. */
> anon_exclusive = folio_test_anon(folio) && PageAnonExclusive(page);
> if (anon_exclusive && folio_try_share_anon_rmap_pmd(folio, page)) {
> @@ -5003,24 +5017,31 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
> return -EBUSY;
> }
>
> - if (pmd_dirty(pmdval))
> - folio_mark_dirty(folio);
> - if (pmd_write(pmdval))
> + /* Determine type of migration entry. */
> + if (writable)
> entry = make_writable_migration_entry(page_to_pfn(page));
> else if (anon_exclusive)
> entry = make_readable_exclusive_migration_entry(page_to_pfn(page));
> else
> entry = make_readable_migration_entry(page_to_pfn(page));
> - if (pmd_young(pmdval))
> +
> + /* Set A/D bits as necessary. */
> + if (present && pmd_young(pmdval))
> entry = make_migration_entry_young(entry);
> - if (pmd_dirty(pmdval))
> + if (present && pmd_dirty(pmdval)) {
> + folio_mark_dirty(folio);
> entry = make_migration_entry_dirty(entry);
> + }
> +
> + /* Set PMD. */
> pmdswp = swp_entry_to_pmd(entry);
> - if (pmd_soft_dirty(pmdval))
> + if (softdirty)
> pmdswp = pmd_swp_mksoft_dirty(pmdswp);
> - if (pmd_uffd_wp(pmdval))
> + if (uffd_wp)
> pmdswp = pmd_swp_mkuffd_wp(pmdswp);
> set_pmd_at(mm, address, pvmw->pmd, pmdswp);
> +
> + /* Migration entry installed: cleanup rmap, folio. */
> folio_remove_rmap_pmd(folio, page, vma);
> folio_put(folio);
> trace_set_migration_pmd(address, pmd_val(pmdswp));
> --
> 2.54.0
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH mm-hotfixes] mm/huge_memory: use correct flags for device private PMD entry
2026-06-01 16:44 ` Dev Jain
@ 2026-06-05 10:07 ` Lorenzo Stoakes
2026-06-05 13:22 ` Dev Jain
0 siblings, 1 reply; 19+ messages in thread
From: Lorenzo Stoakes @ 2026-06-05 10:07 UTC (permalink / raw)
To: Dev Jain
Cc: Andrew Morton, David Hildenbrand, Zi Yan, Baolin Wang,
Liam R . Howlett, Nico Pache, Ryan Roberts, Barry Song,
Lance Yang, SeongJae Park, Balbir Singh, linux-mm, linux-kernel
By the way,
I think maybe the reason you didn't hit this in your work on the spurious
warning stuff for hmm-tests is that I also had to set CONFIG_DEVICE_PRIVATE
(as well as CONFIG_TEST_HMM) to get this to trigger.
I think the reason others maybe didn't see it is because the self tests
will just skip the hmm tests if CONFIG_TEST_HMM is not set.
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH mm-hotfixes] mm/huge_memory: use correct flags for device private PMD entry
2026-06-05 10:07 ` Lorenzo Stoakes
@ 2026-06-05 13:22 ` Dev Jain
0 siblings, 0 replies; 19+ messages in thread
From: Dev Jain @ 2026-06-05 13:22 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Andrew Morton, David Hildenbrand, Zi Yan, Baolin Wang,
Liam R . Howlett, Nico Pache, Ryan Roberts, Barry Song,
Lance Yang, SeongJae Park, Balbir Singh, linux-mm, linux-kernel
On 05/06/26 3:37 pm, Lorenzo Stoakes wrote:
> By the way,
>
> I think maybe the reason you didn't hit this in your work on the spurious
> warning stuff for hmm-tests is that I also had to set CONFIG_DEVICE_PRIVATE
> (as well as CONFIG_TEST_HMM) to get this to trigger.
>
> I think the reason others maybe didn't see it is because the self tests
> will just skip the hmm tests if CONFIG_TEST_HMM is not set.
I have all of that set, still can't hit this.
Instead, I hit something else when I was doing some other work -
https://lore.kernel.org/all/3a25e7fd-84a7-49a6-92a3-96492fe5d2cc@arm.com/
>
> Cheers, Lorenzo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH mm-hotfixes] mm/huge_memory: use correct flags for device private PMD entry
2026-06-01 8:30 [PATCH mm-hotfixes] mm/huge_memory: use correct flags for device private PMD entry Lorenzo Stoakes
` (2 preceding siblings ...)
2026-06-01 16:44 ` Dev Jain
@ 2026-06-01 20:30 ` Balbir Singh
2026-06-02 9:17 ` Lorenzo Stoakes
2026-06-02 3:29 ` Baolin Wang
` (4 subsequent siblings)
8 siblings, 1 reply; 19+ messages in thread
From: Balbir Singh @ 2026-06-01 20:30 UTC (permalink / raw)
To: Lorenzo Stoakes, Andrew Morton
Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
SeongJae Park, linux-mm, linux-kernel
On 6/1/26 18:30, Lorenzo Stoakes wrote:
> Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
> device-private entries") updated set_pmd_migration_entry() to use
> pmdp_huge_get_and_clear() in the softleaf case, but made no further
> adjustments to the function itself.
>
> Therefore this function continues to incorrectly use pmd_write(),
> pmd_soft_dirty() and pmd_uffd_wp() to determine whether the installed
> migration entry should be marked writable, softdirty or uffd-wp
> respectively.
>
> Whilst all are incorrect, the most problematic of these is pmd_write(), as
> this can lead to corrupted rmap state.
>
> On x86-64 _PAGE_SWP_SOFT_DIRTY is aliased to _PAGE_RW. So calling
> pmd_write() on a softleaf will return the softdirty state encoded in the
> entry, assuming CONFIG_MEM_SOFT_DIRTY was enabled.
>
> This was observed when running the hmm.hmm_device_private.anon_write_child
> selftest:
>
> 1. The test faults in a range then migrates it such that a device-private
> THP range is established.
>
> 2. The parent then migrates it to a device-private writable PMD entry whose
> folio is entirely AnonExclusive with entire_mapcount=1, softdirty set
> (accidentally correct write state).
>
> 3. The parent forks and the PMD entries are set to device-private read only
> entries, entire_mapcount=2, softdirty still set.
>
> 4. [BUG] The child writes to the range then migrates to RAM - intending to
> install non-writable migration entries - but replacing parent and child
> PMD mappings with WRITABLE entries due to misinterpreting the softdirty
> bit.
>
> 5. In remove_migration_pmd(), if !softleaf_is_migration_read(entry) we
> set the RMAP_EXCLUSIVE flag when calling folio_add_anon_rmap_pmd() for
> both parent and child, which are therefore AnonExclusive.
>
> 6. [SPLAT] Child sets migrated folio entire_mapcount=1, parent sets
> entire_mapcount=2 and we end up with an AnonExclusive folio with
> entire_mapcount=2! Assert fires in __folio_add_anon_rmap():
>
> VM_WARN_ON_FOLIO(folio_test_large(folio) &&
> folio_entire_mapcount(folio) > 1 &&
> PageAnonExclusive(cur_page), folio)
>
Thanks for the explanation, I wonder why I've not run into this during
my testing, I do have DEBUG_VM enabled in my config. I wonder if I've
never had soft dirty set
> This patch fixes the issue by correctly referencing the softleaf entry
> fields for writable, softdirty and uffd-wp in set_pmd_migration_entry().
>
> It also only updates A/D flags if the entry is present as these are
> otherwise not meaningful for a softleaf entry.
>
> This patch also flips the if (!present) { ... } else { ... } logic in
> set_pmd_migration_entry() so it is easier to understand, and adds some
> comments to make things clearer.
>
> I was able to bisect this to commit 775465fd26a3 ("lib/test_hmm: add zone
> device private THP test infrastructure") which first exposes this bug as it
> was the commit that permitted test_hmm to generate the test.
>
> However commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
> device-private entries") is the commit that actually enabled this
> behaviour.
>
> Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
> Cc: stable@vger.kernel.org
> Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
> ---
> mm/huge_memory.c | 45 +++++++++++++++++++++++++++++++++------------
> 1 file changed, 33 insertions(+), 12 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index bf9b480bb3b0..79463c709c98 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -4982,7 +4982,7 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
> struct vm_area_struct *vma = pvmw->vma;
> struct mm_struct *mm = vma->vm_mm;
> unsigned long address = pvmw->address;
> - bool anon_exclusive;
> + bool anon_exclusive, present, writable, softdirty, uffd_wp;
> pmd_t pmdval;
> swp_entry_t entry;
> pmd_t pmdswp;
> @@ -4990,12 +4990,26 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
> if (!(pvmw->pmd && !pvmw->pte))
> return 0;
>
> - flush_cache_range(vma, address, address + HPAGE_PMD_SIZE);
> - if (unlikely(!pmd_present(*pvmw->pmd)))
> - pmdval = pmdp_huge_get_and_clear(vma->vm_mm, address, pvmw->pmd);
> - else
> + present = pmd_present(*pvmw->pmd);
> + if (likely(present)) {
> + flush_cache_range(vma, address, address + HPAGE_PMD_SIZE);
> +
> pmdval = pmdp_invalidate(vma, address, pvmw->pmd);
>
> + writable = pmd_write(pmdval);
> + softdirty = pmd_soft_dirty(pmdval);
> + uffd_wp = pmd_uffd_wp(pmdval);
> + } else {
> + softleaf_t old_entry;
> +
> + pmdval = pmdp_huge_get_and_clear(vma->vm_mm, address, pvmw->pmd);
> + old_entry = softleaf_from_pmd(pmdval);
> +
> + writable = softleaf_is_device_private_write(old_entry);
> + softdirty = pmd_swp_soft_dirty(pmdval);
> + uffd_wp = pmd_swp_uffd_wp(pmdval);
> + }
> +
> /* See folio_try_share_anon_rmap_pmd(): invalidate PMD first. */
> anon_exclusive = folio_test_anon(folio) && PageAnonExclusive(page);
> if (anon_exclusive && folio_try_share_anon_rmap_pmd(folio, page)) {
> @@ -5003,24 +5017,31 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
> return -EBUSY;
> }
>
> - if (pmd_dirty(pmdval))
> - folio_mark_dirty(folio);
> - if (pmd_write(pmdval))
> + /* Determine type of migration entry. */
> + if (writable)
> entry = make_writable_migration_entry(page_to_pfn(page));
> else if (anon_exclusive)
> entry = make_readable_exclusive_migration_entry(page_to_pfn(page));
> else
> entry = make_readable_migration_entry(page_to_pfn(page));
> - if (pmd_young(pmdval))
> +
> + /* Set A/D bits as necessary. */
> + if (present && pmd_young(pmdval))
> entry = make_migration_entry_young(entry);
> - if (pmd_dirty(pmdval))
> + if (present && pmd_dirty(pmdval)) {
> + folio_mark_dirty(folio);
> entry = make_migration_entry_dirty(entry);
> + }
> +
> + /* Set PMD. */
> pmdswp = swp_entry_to_pmd(entry);
> - if (pmd_soft_dirty(pmdval))
> + if (softdirty)
> pmdswp = pmd_swp_mksoft_dirty(pmdswp);
> - if (pmd_uffd_wp(pmdval))
> + if (uffd_wp)
> pmdswp = pmd_swp_mkuffd_wp(pmdswp);
> set_pmd_at(mm, address, pvmw->pmd, pmdswp);
> +
> + /* Migration entry installed: cleanup rmap, folio. */
> folio_remove_rmap_pmd(folio, page, vma);
> folio_put(folio);
> trace_set_migration_pmd(address, pmd_val(pmdswp));
> --
Reviewed-by: Balbir Singh <balbirs@nvidia.com>
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH mm-hotfixes] mm/huge_memory: use correct flags for device private PMD entry
2026-06-01 20:30 ` Balbir Singh
@ 2026-06-02 9:17 ` Lorenzo Stoakes
0 siblings, 0 replies; 19+ messages in thread
From: Lorenzo Stoakes @ 2026-06-02 9:17 UTC (permalink / raw)
To: Balbir Singh
Cc: Andrew Morton, David Hildenbrand, Zi Yan, Baolin Wang,
Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, SeongJae Park, linux-mm, linux-kernel
On Tue, Jun 02, 2026 at 06:30:45AM +1000, Balbir Singh wrote:
> On 6/1/26 18:30, Lorenzo Stoakes wrote:
> > Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
> > device-private entries") updated set_pmd_migration_entry() to use
> > pmdp_huge_get_and_clear() in the softleaf case, but made no further
> > adjustments to the function itself.
> >
> > Therefore this function continues to incorrectly use pmd_write(),
> > pmd_soft_dirty() and pmd_uffd_wp() to determine whether the installed
> > migration entry should be marked writable, softdirty or uffd-wp
> > respectively.
> >
> > Whilst all are incorrect, the most problematic of these is pmd_write(), as
> > this can lead to corrupted rmap state.
> >
> > On x86-64 _PAGE_SWP_SOFT_DIRTY is aliased to _PAGE_RW. So calling
> > pmd_write() on a softleaf will return the softdirty state encoded in the
> > entry, assuming CONFIG_MEM_SOFT_DIRTY was enabled.
> >
> > This was observed when running the hmm.hmm_device_private.anon_write_child
> > selftest:
> >
> > 1. The test faults in a range then migrates it such that a device-private
> > THP range is established.
> >
> > 2. The parent then migrates it to a device-private writable PMD entry whose
> > folio is entirely AnonExclusive with entire_mapcount=1, softdirty set
> > (accidentally correct write state).
> >
> > 3. The parent forks and the PMD entries are set to device-private read only
> > entries, entire_mapcount=2, softdirty still set.
> >
> > 4. [BUG] The child writes to the range then migrates to RAM - intending to
> > install non-writable migration entries - but replacing parent and child
> > PMD mappings with WRITABLE entries due to misinterpreting the softdirty
> > bit.
> >
> > 5. In remove_migration_pmd(), if !softleaf_is_migration_read(entry) we
> > set the RMAP_EXCLUSIVE flag when calling folio_add_anon_rmap_pmd() for
> > both parent and child, which are therefore AnonExclusive.
> >
> > 6. [SPLAT] Child sets migrated folio entire_mapcount=1, parent sets
> > entire_mapcount=2 and we end up with an AnonExclusive folio with
> > entire_mapcount=2! Assert fires in __folio_add_anon_rmap():
> >
> > VM_WARN_ON_FOLIO(folio_test_large(folio) &&
> > folio_entire_mapcount(folio) > 1 &&
> > PageAnonExclusive(cur_page), folio)
> >
>
> Thanks for the explanation, I wonder why I've not run into this during
> my testing, I do have DEBUG_VM enabled in my config. I wonder if I've
> never had soft dirty set
No worries! I happened to hit it when reviewing a patch and testing
locally.
Yeah I did wonder why others didn't hit it - I guess the HMM tests are is
easily skipped if the module wasn't built for one, and perhaps either
CONFIG_DEBUG_VM not set or CONFIG_MEM_SOFT_DIRTY?
There also might be some other factor that my config happens to trigger
that others do not?
>
> > This patch fixes the issue by correctly referencing the softleaf entry
> > fields for writable, softdirty and uffd-wp in set_pmd_migration_entry().
> >
> > It also only updates A/D flags if the entry is present as these are
> > otherwise not meaningful for a softleaf entry.
> >
> > This patch also flips the if (!present) { ... } else { ... } logic in
> > set_pmd_migration_entry() so it is easier to understand, and adds some
> > comments to make things clearer.
> >
> > I was able to bisect this to commit 775465fd26a3 ("lib/test_hmm: add zone
> > device private THP test infrastructure") which first exposes this bug as it
> > was the commit that permitted test_hmm to generate the test.
> >
> > However commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
> > device-private entries") is the commit that actually enabled this
> > behaviour.
> >
> > Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
> > ---
> > mm/huge_memory.c | 45 +++++++++++++++++++++++++++++++++------------
> > 1 file changed, 33 insertions(+), 12 deletions(-)
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index bf9b480bb3b0..79463c709c98 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -4982,7 +4982,7 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
> > struct vm_area_struct *vma = pvmw->vma;
> > struct mm_struct *mm = vma->vm_mm;
> > unsigned long address = pvmw->address;
> > - bool anon_exclusive;
> > + bool anon_exclusive, present, writable, softdirty, uffd_wp;
> > pmd_t pmdval;
> > swp_entry_t entry;
> > pmd_t pmdswp;
> > @@ -4990,12 +4990,26 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
> > if (!(pvmw->pmd && !pvmw->pte))
> > return 0;
> >
> > - flush_cache_range(vma, address, address + HPAGE_PMD_SIZE);
> > - if (unlikely(!pmd_present(*pvmw->pmd)))
> > - pmdval = pmdp_huge_get_and_clear(vma->vm_mm, address, pvmw->pmd);
> > - else
> > + present = pmd_present(*pvmw->pmd);
> > + if (likely(present)) {
> > + flush_cache_range(vma, address, address + HPAGE_PMD_SIZE);
> > +
> > pmdval = pmdp_invalidate(vma, address, pvmw->pmd);
> >
> > + writable = pmd_write(pmdval);
> > + softdirty = pmd_soft_dirty(pmdval);
> > + uffd_wp = pmd_uffd_wp(pmdval);
> > + } else {
> > + softleaf_t old_entry;
> > +
> > + pmdval = pmdp_huge_get_and_clear(vma->vm_mm, address, pvmw->pmd);
> > + old_entry = softleaf_from_pmd(pmdval);
> > +
> > + writable = softleaf_is_device_private_write(old_entry);
> > + softdirty = pmd_swp_soft_dirty(pmdval);
> > + uffd_wp = pmd_swp_uffd_wp(pmdval);
> > + }
> > +
> > /* See folio_try_share_anon_rmap_pmd(): invalidate PMD first. */
> > anon_exclusive = folio_test_anon(folio) && PageAnonExclusive(page);
> > if (anon_exclusive && folio_try_share_anon_rmap_pmd(folio, page)) {
> > @@ -5003,24 +5017,31 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
> > return -EBUSY;
> > }
> >
> > - if (pmd_dirty(pmdval))
> > - folio_mark_dirty(folio);
> > - if (pmd_write(pmdval))
> > + /* Determine type of migration entry. */
> > + if (writable)
> > entry = make_writable_migration_entry(page_to_pfn(page));
> > else if (anon_exclusive)
> > entry = make_readable_exclusive_migration_entry(page_to_pfn(page));
> > else
> > entry = make_readable_migration_entry(page_to_pfn(page));
> > - if (pmd_young(pmdval))
> > +
> > + /* Set A/D bits as necessary. */
> > + if (present && pmd_young(pmdval))
> > entry = make_migration_entry_young(entry);
> > - if (pmd_dirty(pmdval))
> > + if (present && pmd_dirty(pmdval)) {
> > + folio_mark_dirty(folio);
> > entry = make_migration_entry_dirty(entry);
> > + }
> > +
> > + /* Set PMD. */
> > pmdswp = swp_entry_to_pmd(entry);
> > - if (pmd_soft_dirty(pmdval))
> > + if (softdirty)
> > pmdswp = pmd_swp_mksoft_dirty(pmdswp);
> > - if (pmd_uffd_wp(pmdval))
> > + if (uffd_wp)
> > pmdswp = pmd_swp_mkuffd_wp(pmdswp);
> > set_pmd_at(mm, address, pvmw->pmd, pmdswp);
> > +
> > + /* Migration entry installed: cleanup rmap, folio. */
> > folio_remove_rmap_pmd(folio, page, vma);
> > folio_put(folio);
> > trace_set_migration_pmd(address, pmd_val(pmdswp));
> > --
>
>
> Reviewed-by: Balbir Singh <balbirs@nvidia.com>
Thanks!
>
>
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH mm-hotfixes] mm/huge_memory: use correct flags for device private PMD entry
2026-06-01 8:30 [PATCH mm-hotfixes] mm/huge_memory: use correct flags for device private PMD entry Lorenzo Stoakes
` (3 preceding siblings ...)
2026-06-01 20:30 ` Balbir Singh
@ 2026-06-02 3:29 ` Baolin Wang
2026-06-02 4:09 ` Oscar Salvador (SUSE)
` (3 subsequent siblings)
8 siblings, 0 replies; 19+ messages in thread
From: Baolin Wang @ 2026-06-02 3:29 UTC (permalink / raw)
To: Lorenzo Stoakes, Andrew Morton
Cc: David Hildenbrand, Zi Yan, Liam R . Howlett, Nico Pache,
Ryan Roberts, Dev Jain, Barry Song, Lance Yang, SeongJae Park,
Balbir Singh, linux-mm, linux-kernel
On 6/1/26 4:30 PM, Lorenzo Stoakes wrote:
> Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
> device-private entries") updated set_pmd_migration_entry() to use
> pmdp_huge_get_and_clear() in the softleaf case, but made no further
> adjustments to the function itself.
>
> Therefore this function continues to incorrectly use pmd_write(),
> pmd_soft_dirty() and pmd_uffd_wp() to determine whether the installed
> migration entry should be marked writable, softdirty or uffd-wp
> respectively.
>
> Whilst all are incorrect, the most problematic of these is pmd_write(), as
> this can lead to corrupted rmap state.
>
> On x86-64 _PAGE_SWP_SOFT_DIRTY is aliased to _PAGE_RW. So calling
> pmd_write() on a softleaf will return the softdirty state encoded in the
> entry, assuming CONFIG_MEM_SOFT_DIRTY was enabled.
>
> This was observed when running the hmm.hmm_device_private.anon_write_child
> selftest:
>
> 1. The test faults in a range then migrates it such that a device-private
> THP range is established.
>
> 2. The parent then migrates it to a device-private writable PMD entry whose
> folio is entirely AnonExclusive with entire_mapcount=1, softdirty set
> (accidentally correct write state).
>
> 3. The parent forks and the PMD entries are set to device-private read only
> entries, entire_mapcount=2, softdirty still set.
>
> 4. [BUG] The child writes to the range then migrates to RAM - intending to
> install non-writable migration entries - but replacing parent and child
> PMD mappings with WRITABLE entries due to misinterpreting the softdirty
> bit.
>
> 5. In remove_migration_pmd(), if !softleaf_is_migration_read(entry) we
> set the RMAP_EXCLUSIVE flag when calling folio_add_anon_rmap_pmd() for
> both parent and child, which are therefore AnonExclusive.
>
> 6. [SPLAT] Child sets migrated folio entire_mapcount=1, parent sets
> entire_mapcount=2 and we end up with an AnonExclusive folio with
> entire_mapcount=2! Assert fires in __folio_add_anon_rmap():
>
> VM_WARN_ON_FOLIO(folio_test_large(folio) &&
> folio_entire_mapcount(folio) > 1 &&
> PageAnonExclusive(cur_page), folio)
>
> This patch fixes the issue by correctly referencing the softleaf entry
> fields for writable, softdirty and uffd-wp in set_pmd_migration_entry().
>
> It also only updates A/D flags if the entry is present as these are
> otherwise not meaningful for a softleaf entry.
>
> This patch also flips the if (!present) { ... } else { ... } logic in
> set_pmd_migration_entry() so it is easier to understand, and adds some
> comments to make things clearer.
>
> I was able to bisect this to commit 775465fd26a3 ("lib/test_hmm: add zone
> device private THP test infrastructure") which first exposes this bug as it
> was the commit that permitted test_hmm to generate the test.
>
> However commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
> device-private entries") is the commit that actually enabled this
> behaviour.
>
> Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
> Cc: stable@vger.kernel.org
> Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
> ---
Thanks for your detailed explanation. Feel free to add:
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH mm-hotfixes] mm/huge_memory: use correct flags for device private PMD entry
2026-06-01 8:30 [PATCH mm-hotfixes] mm/huge_memory: use correct flags for device private PMD entry Lorenzo Stoakes
` (4 preceding siblings ...)
2026-06-02 3:29 ` Baolin Wang
@ 2026-06-02 4:09 ` Oscar Salvador (SUSE)
2026-06-02 4:38 ` Barry Song
` (2 subsequent siblings)
8 siblings, 0 replies; 19+ messages in thread
From: Oscar Salvador (SUSE) @ 2026-06-02 4:09 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Andrew Morton, David Hildenbrand, Zi Yan, Baolin Wang,
Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, SeongJae Park, Balbir Singh, linux-mm, linux-kernel
On Mon, Jun 01, 2026 at 09:30:44AM +0100, Lorenzo Stoakes wrote:
> Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
> device-private entries") updated set_pmd_migration_entry() to use
> pmdp_huge_get_and_clear() in the softleaf case, but made no further
> adjustments to the function itself.
>
> Therefore this function continues to incorrectly use pmd_write(),
> pmd_soft_dirty() and pmd_uffd_wp() to determine whether the installed
> migration entry should be marked writable, softdirty or uffd-wp
> respectively.
>
> Whilst all are incorrect, the most problematic of these is pmd_write(), as
> this can lead to corrupted rmap state.
>
> On x86-64 _PAGE_SWP_SOFT_DIRTY is aliased to _PAGE_RW. So calling
> pmd_write() on a softleaf will return the softdirty state encoded in the
> entry, assuming CONFIG_MEM_SOFT_DIRTY was enabled.
>
> This was observed when running the hmm.hmm_device_private.anon_write_child
> selftest:
>
> 1. The test faults in a range then migrates it such that a device-private
> THP range is established.
>
> 2. The parent then migrates it to a device-private writable PMD entry whose
> folio is entirely AnonExclusive with entire_mapcount=1, softdirty set
> (accidentally correct write state).
>
> 3. The parent forks and the PMD entries are set to device-private read only
> entries, entire_mapcount=2, softdirty still set.
>
> 4. [BUG] The child writes to the range then migrates to RAM - intending to
> install non-writable migration entries - but replacing parent and child
> PMD mappings with WRITABLE entries due to misinterpreting the softdirty
> bit.
>
> 5. In remove_migration_pmd(), if !softleaf_is_migration_read(entry) we
> set the RMAP_EXCLUSIVE flag when calling folio_add_anon_rmap_pmd() for
> both parent and child, which are therefore AnonExclusive.
>
> 6. [SPLAT] Child sets migrated folio entire_mapcount=1, parent sets
> entire_mapcount=2 and we end up with an AnonExclusive folio with
> entire_mapcount=2! Assert fires in __folio_add_anon_rmap():
>
> VM_WARN_ON_FOLIO(folio_test_large(folio) &&
> folio_entire_mapcount(folio) > 1 &&
> PageAnonExclusive(cur_page), folio)
>
> This patch fixes the issue by correctly referencing the softleaf entry
> fields for writable, softdirty and uffd-wp in set_pmd_migration_entry().
>
> It also only updates A/D flags if the entry is present as these are
> otherwise not meaningful for a softleaf entry.
>
> This patch also flips the if (!present) { ... } else { ... } logic in
> set_pmd_migration_entry() so it is easier to understand, and adds some
> comments to make things clearer.
>
> I was able to bisect this to commit 775465fd26a3 ("lib/test_hmm: add zone
> device private THP test infrastructure") which first exposes this bug as it
> was the commit that permitted test_hmm to generate the test.
>
> However commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
> device-private entries") is the commit that actually enabled this
> behaviour.
>
> Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
> Cc: stable@vger.kernel.org
> Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
LGTM,
Reviewed-by: Oscar Salvador (SUSE) <osalvador@kernel.org>
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH mm-hotfixes] mm/huge_memory: use correct flags for device private PMD entry
2026-06-01 8:30 [PATCH mm-hotfixes] mm/huge_memory: use correct flags for device private PMD entry Lorenzo Stoakes
` (5 preceding siblings ...)
2026-06-02 4:09 ` Oscar Salvador (SUSE)
@ 2026-06-02 4:38 ` Barry Song
2026-06-02 6:32 ` Lance Yang
2026-06-02 14:40 ` Zi Yan
8 siblings, 0 replies; 19+ messages in thread
From: Barry Song @ 2026-06-02 4:38 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Andrew Morton, David Hildenbrand, Zi Yan, Baolin Wang,
Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain, Lance Yang,
SeongJae Park, Balbir Singh, linux-mm, linux-kernel
On Mon, Jun 1, 2026 at 4:30 PM Lorenzo Stoakes <ljs@kernel.org> wrote:
>
> Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
> device-private entries") updated set_pmd_migration_entry() to use
> pmdp_huge_get_and_clear() in the softleaf case, but made no further
> adjustments to the function itself.
>
> Therefore this function continues to incorrectly use pmd_write(),
> pmd_soft_dirty() and pmd_uffd_wp() to determine whether the installed
> migration entry should be marked writable, softdirty or uffd-wp
> respectively.
>
> Whilst all are incorrect, the most problematic of these is pmd_write(), as
> this can lead to corrupted rmap state.
>
> On x86-64 _PAGE_SWP_SOFT_DIRTY is aliased to _PAGE_RW. So calling
> pmd_write() on a softleaf will return the softdirty state encoded in the
> entry, assuming CONFIG_MEM_SOFT_DIRTY was enabled.
>
> This was observed when running the hmm.hmm_device_private.anon_write_child
> selftest:
>
> 1. The test faults in a range then migrates it such that a device-private
> THP range is established.
>
> 2. The parent then migrates it to a device-private writable PMD entry whose
> folio is entirely AnonExclusive with entire_mapcount=1, softdirty set
> (accidentally correct write state).
>
> 3. The parent forks and the PMD entries are set to device-private read only
> entries, entire_mapcount=2, softdirty still set.
>
> 4. [BUG] The child writes to the range then migrates to RAM - intending to
> install non-writable migration entries - but replacing parent and child
> PMD mappings with WRITABLE entries due to misinterpreting the softdirty
> bit.
>
> 5. In remove_migration_pmd(), if !softleaf_is_migration_read(entry) we
> set the RMAP_EXCLUSIVE flag when calling folio_add_anon_rmap_pmd() for
> both parent and child, which are therefore AnonExclusive.
>
> 6. [SPLAT] Child sets migrated folio entire_mapcount=1, parent sets
> entire_mapcount=2 and we end up with an AnonExclusive folio with
> entire_mapcount=2! Assert fires in __folio_add_anon_rmap():
>
> VM_WARN_ON_FOLIO(folio_test_large(folio) &&
> folio_entire_mapcount(folio) > 1 &&
> PageAnonExclusive(cur_page), folio)
>
> This patch fixes the issue by correctly referencing the softleaf entry
> fields for writable, softdirty and uffd-wp in set_pmd_migration_entry().
>
> It also only updates A/D flags if the entry is present as these are
> otherwise not meaningful for a softleaf entry.
>
> This patch also flips the if (!present) { ... } else { ... } logic in
> set_pmd_migration_entry() so it is easier to understand, and adds some
> comments to make things clearer.
>
> I was able to bisect this to commit 775465fd26a3 ("lib/test_hmm: add zone
> device private THP test infrastructure") which first exposes this bug as it
> was the commit that permitted test_hmm to generate the test.
>
> However commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
> device-private entries") is the commit that actually enabled this
> behaviour.
Thanks for the excellent changelog and the detailed steps.
Reviewed-by: Barry Song <baohua@kernel.org>
>
> Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
> Cc: stable@vger.kernel.org
> Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH mm-hotfixes] mm/huge_memory: use correct flags for device private PMD entry
2026-06-01 8:30 [PATCH mm-hotfixes] mm/huge_memory: use correct flags for device private PMD entry Lorenzo Stoakes
` (6 preceding siblings ...)
2026-06-02 4:38 ` Barry Song
@ 2026-06-02 6:32 ` Lance Yang
2026-06-02 14:40 ` Zi Yan
8 siblings, 0 replies; 19+ messages in thread
From: Lance Yang @ 2026-06-02 6:32 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
Nico Pache, Andrew Morton, Ryan Roberts, Dev Jain, Barry Song,
SeongJae Park, Balbir Singh, linux-mm, linux-kernel
On 2026/6/1 16:30, Lorenzo Stoakes wrote:
> Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
> device-private entries") updated set_pmd_migration_entry() to use
> pmdp_huge_get_and_clear() in the softleaf case, but made no further
> adjustments to the function itself.
>
> Therefore this function continues to incorrectly use pmd_write(),
> pmd_soft_dirty() and pmd_uffd_wp() to determine whether the installed
> migration entry should be marked writable, softdirty or uffd-wp
> respectively.
>
> Whilst all are incorrect, the most problematic of these is pmd_write(), as
> this can lead to corrupted rmap state.
>
> On x86-64 _PAGE_SWP_SOFT_DIRTY is aliased to _PAGE_RW. So calling
> pmd_write() on a softleaf will return the softdirty state encoded in the
> entry, assuming CONFIG_MEM_SOFT_DIRTY was enabled.
>
> This was observed when running the hmm.hmm_device_private.anon_write_child
> selftest:
>
> 1. The test faults in a range then migrates it such that a device-private
> THP range is established.
>
> 2. The parent then migrates it to a device-private writable PMD entry whose
> folio is entirely AnonExclusive with entire_mapcount=1, softdirty set
> (accidentally correct write state).
>
> 3. The parent forks and the PMD entries are set to device-private read only
> entries, entire_mapcount=2, softdirty still set.
>
> 4. [BUG] The child writes to the range then migrates to RAM - intending to
> install non-writable migration entries - but replacing parent and child
> PMD mappings with WRITABLE entries due to misinterpreting the softdirty
> bit.
>
> 5. In remove_migration_pmd(), if !softleaf_is_migration_read(entry) we
> set the RMAP_EXCLUSIVE flag when calling folio_add_anon_rmap_pmd() for
> both parent and child, which are therefore AnonExclusive.
>
> 6. [SPLAT] Child sets migrated folio entire_mapcount=1, parent sets
> entire_mapcount=2 and we end up with an AnonExclusive folio with
> entire_mapcount=2! Assert fires in __folio_add_anon_rmap():
>
> VM_WARN_ON_FOLIO(folio_test_large(folio) &&
> folio_entire_mapcount(folio) > 1 &&
> PageAnonExclusive(cur_page), folio)
>
> This patch fixes the issue by correctly referencing the softleaf entry
> fields for writable, softdirty and uffd-wp in set_pmd_migration_entry().
>
> It also only updates A/D flags if the entry is present as these are
> otherwise not meaningful for a softleaf entry.
>
> This patch also flips the if (!present) { ... } else { ... } logic in
> set_pmd_migration_entry() so it is easier to understand, and adds some
> comments to make things clearer.
>
> I was able to bisect this to commit 775465fd26a3 ("lib/test_hmm: add zone
> device private THP test infrastructure") which first exposes this bug as it
> was the commit that permitted test_hmm to generate the test.
>
> However commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
> device-private entries") is the commit that actually enabled this
> behaviour.
>
> Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
> Cc: stable@vger.kernel.org
> Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
> ---
Cool, lesson learned! Feel free to add:
Reviewed-by: Lance Yang <lance.yang@linux.dev>
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH mm-hotfixes] mm/huge_memory: use correct flags for device private PMD entry
2026-06-01 8:30 [PATCH mm-hotfixes] mm/huge_memory: use correct flags for device private PMD entry Lorenzo Stoakes
` (7 preceding siblings ...)
2026-06-02 6:32 ` Lance Yang
@ 2026-06-02 14:40 ` Zi Yan
2026-06-02 17:26 ` Lorenzo Stoakes
8 siblings, 1 reply; 19+ messages in thread
From: Zi Yan @ 2026-06-02 14:40 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Andrew Morton, David Hildenbrand, Baolin Wang, Liam R . Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
SeongJae Park, Balbir Singh, linux-mm, linux-kernel
On 1 Jun 2026, at 4:30, Lorenzo Stoakes wrote:
> Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
> device-private entries") updated set_pmd_migration_entry() to use
> pmdp_huge_get_and_clear() in the softleaf case, but made no further
> adjustments to the function itself.
>
> Therefore this function continues to incorrectly use pmd_write(),
> pmd_soft_dirty() and pmd_uffd_wp() to determine whether the installed
> migration entry should be marked writable, softdirty or uffd-wp
> respectively.
>
> Whilst all are incorrect, the most problematic of these is pmd_write(), as
> this can lead to corrupted rmap state.
>
> On x86-64 _PAGE_SWP_SOFT_DIRTY is aliased to _PAGE_RW. So calling
> pmd_write() on a softleaf will return the softdirty state encoded in the
> entry, assuming CONFIG_MEM_SOFT_DIRTY was enabled.
>
> This was observed when running the hmm.hmm_device_private.anon_write_child
> selftest:
>
> 1. The test faults in a range then migrates it such that a device-private
> THP range is established.
>
> 2. The parent then migrates it to a device-private writable PMD entry whose
> folio is entirely AnonExclusive with entire_mapcount=1, softdirty set
> (accidentally correct write state).
>
> 3. The parent forks and the PMD entries are set to device-private read only
> entries, entire_mapcount=2, softdirty still set.
>
> 4. [BUG] The child writes to the range then migrates to RAM - intending to
> install non-writable migration entries - but replacing parent and child
> PMD mappings with WRITABLE entries due to misinterpreting the softdirty
> bit.
>
> 5. In remove_migration_pmd(), if !softleaf_is_migration_read(entry) we
> set the RMAP_EXCLUSIVE flag when calling folio_add_anon_rmap_pmd() for
> both parent and child, which are therefore AnonExclusive.
>
> 6. [SPLAT] Child sets migrated folio entire_mapcount=1, parent sets
> entire_mapcount=2 and we end up with an AnonExclusive folio with
> entire_mapcount=2! Assert fires in __folio_add_anon_rmap():
>
> VM_WARN_ON_FOLIO(folio_test_large(folio) &&
> folio_entire_mapcount(folio) > 1 &&
> PageAnonExclusive(cur_page), folio)
>
> This patch fixes the issue by correctly referencing the softleaf entry
> fields for writable, softdirty and uffd-wp in set_pmd_migration_entry().
>
> It also only updates A/D flags if the entry is present as these are
> otherwise not meaningful for a softleaf entry.
>
> This patch also flips the if (!present) { ... } else { ... } logic in
> set_pmd_migration_entry() so it is easier to understand, and adds some
> comments to make things clearer.
>
> I was able to bisect this to commit 775465fd26a3 ("lib/test_hmm: add zone
> device private THP test infrastructure") which first exposes this bug as it
> was the commit that permitted test_hmm to generate the test.
>
> However commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
> device-private entries") is the commit that actually enabled this
> behaviour.
Thanks for the detailed explanation.
>
> Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
> Cc: stable@vger.kernel.org
> Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
> ---
> mm/huge_memory.c | 45 +++++++++++++++++++++++++++++++++------------
> 1 file changed, 33 insertions(+), 12 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index bf9b480bb3b0..79463c709c98 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -4982,7 +4982,7 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
> struct vm_area_struct *vma = pvmw->vma;
> struct mm_struct *mm = vma->vm_mm;
> unsigned long address = pvmw->address;
> - bool anon_exclusive;
> + bool anon_exclusive, present, writable, softdirty, uffd_wp;
> pmd_t pmdval;
> swp_entry_t entry;
> pmd_t pmdswp;
> @@ -4990,12 +4990,26 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
> if (!(pvmw->pmd && !pvmw->pte))
> return 0;
>
> - flush_cache_range(vma, address, address + HPAGE_PMD_SIZE);
> - if (unlikely(!pmd_present(*pvmw->pmd)))
> - pmdval = pmdp_huge_get_and_clear(vma->vm_mm, address, pvmw->pmd);
> - else
> + present = pmd_present(*pvmw->pmd);
> + if (likely(present)) {
> + flush_cache_range(vma, address, address + HPAGE_PMD_SIZE);
> +
> pmdval = pmdp_invalidate(vma, address, pvmw->pmd);
>
> + writable = pmd_write(pmdval);
> + softdirty = pmd_soft_dirty(pmdval);
> + uffd_wp = pmd_uffd_wp(pmdval);
> + } else {
> + softleaf_t old_entry;
> +
> + pmdval = pmdp_huge_get_and_clear(vma->vm_mm, address, pvmw->pmd);
> + old_entry = softleaf_from_pmd(pmdval);
> +
> + writable = softleaf_is_device_private_write(old_entry);
Just to make sure I get it. This means the only possible writable
non present/softleaf entry is device private writable. There is
writable migration entry, but since we are setting a migration entry
here, that should not be possible.
The patch LGTM. Thanks.
Reviewed-by: Zi Yan <ziy@nvidia.com>
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH mm-hotfixes] mm/huge_memory: use correct flags for device private PMD entry
2026-06-02 14:40 ` Zi Yan
@ 2026-06-02 17:26 ` Lorenzo Stoakes
0 siblings, 0 replies; 19+ messages in thread
From: Lorenzo Stoakes @ 2026-06-02 17:26 UTC (permalink / raw)
To: Zi Yan
Cc: Andrew Morton, David Hildenbrand, Baolin Wang, Liam R . Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
SeongJae Park, Balbir Singh, linux-mm, linux-kernel
On Tue, Jun 02, 2026 at 10:40:16AM -0400, Zi Yan wrote:
> On 1 Jun 2026, at 4:30, Lorenzo Stoakes wrote:
>
> > Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
> > device-private entries") updated set_pmd_migration_entry() to use
> > pmdp_huge_get_and_clear() in the softleaf case, but made no further
> > adjustments to the function itself.
> >
> > Therefore this function continues to incorrectly use pmd_write(),
> > pmd_soft_dirty() and pmd_uffd_wp() to determine whether the installed
> > migration entry should be marked writable, softdirty or uffd-wp
> > respectively.
> >
> > Whilst all are incorrect, the most problematic of these is pmd_write(), as
> > this can lead to corrupted rmap state.
> >
> > On x86-64 _PAGE_SWP_SOFT_DIRTY is aliased to _PAGE_RW. So calling
> > pmd_write() on a softleaf will return the softdirty state encoded in the
> > entry, assuming CONFIG_MEM_SOFT_DIRTY was enabled.
> >
> > This was observed when running the hmm.hmm_device_private.anon_write_child
> > selftest:
> >
> > 1. The test faults in a range then migrates it such that a device-private
> > THP range is established.
> >
> > 2. The parent then migrates it to a device-private writable PMD entry whose
> > folio is entirely AnonExclusive with entire_mapcount=1, softdirty set
> > (accidentally correct write state).
> >
> > 3. The parent forks and the PMD entries are set to device-private read only
> > entries, entire_mapcount=2, softdirty still set.
> >
> > 4. [BUG] The child writes to the range then migrates to RAM - intending to
> > install non-writable migration entries - but replacing parent and child
> > PMD mappings with WRITABLE entries due to misinterpreting the softdirty
> > bit.
> >
> > 5. In remove_migration_pmd(), if !softleaf_is_migration_read(entry) we
> > set the RMAP_EXCLUSIVE flag when calling folio_add_anon_rmap_pmd() for
> > both parent and child, which are therefore AnonExclusive.
> >
> > 6. [SPLAT] Child sets migrated folio entire_mapcount=1, parent sets
> > entire_mapcount=2 and we end up with an AnonExclusive folio with
> > entire_mapcount=2! Assert fires in __folio_add_anon_rmap():
> >
> > VM_WARN_ON_FOLIO(folio_test_large(folio) &&
> > folio_entire_mapcount(folio) > 1 &&
> > PageAnonExclusive(cur_page), folio)
> >
> > This patch fixes the issue by correctly referencing the softleaf entry
> > fields for writable, softdirty and uffd-wp in set_pmd_migration_entry().
> >
> > It also only updates A/D flags if the entry is present as these are
> > otherwise not meaningful for a softleaf entry.
> >
> > This patch also flips the if (!present) { ... } else { ... } logic in
> > set_pmd_migration_entry() so it is easier to understand, and adds some
> > comments to make things clearer.
> >
> > I was able to bisect this to commit 775465fd26a3 ("lib/test_hmm: add zone
> > device private THP test infrastructure") which first exposes this bug as it
> > was the commit that permitted test_hmm to generate the test.
> >
> > However commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
> > device-private entries") is the commit that actually enabled this
> > behaviour.
>
> Thanks for the detailed explanation.
> >
> > Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
> > ---
> > mm/huge_memory.c | 45 +++++++++++++++++++++++++++++++++------------
> > 1 file changed, 33 insertions(+), 12 deletions(-)
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index bf9b480bb3b0..79463c709c98 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -4982,7 +4982,7 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
> > struct vm_area_struct *vma = pvmw->vma;
> > struct mm_struct *mm = vma->vm_mm;
> > unsigned long address = pvmw->address;
> > - bool anon_exclusive;
> > + bool anon_exclusive, present, writable, softdirty, uffd_wp;
> > pmd_t pmdval;
> > swp_entry_t entry;
> > pmd_t pmdswp;
> > @@ -4990,12 +4990,26 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
> > if (!(pvmw->pmd && !pvmw->pte))
> > return 0;
> >
> > - flush_cache_range(vma, address, address + HPAGE_PMD_SIZE);
> > - if (unlikely(!pmd_present(*pvmw->pmd)))
> > - pmdval = pmdp_huge_get_and_clear(vma->vm_mm, address, pvmw->pmd);
> > - else
> > + present = pmd_present(*pvmw->pmd);
> > + if (likely(present)) {
> > + flush_cache_range(vma, address, address + HPAGE_PMD_SIZE);
> > +
> > pmdval = pmdp_invalidate(vma, address, pvmw->pmd);
> >
> > + writable = pmd_write(pmdval);
> > + softdirty = pmd_soft_dirty(pmdval);
> > + uffd_wp = pmd_uffd_wp(pmdval);
> > + } else {
> > + softleaf_t old_entry;
> > +
> > + pmdval = pmdp_huge_get_and_clear(vma->vm_mm, address, pvmw->pmd);
> > + old_entry = softleaf_from_pmd(pmdval);
> > +
> > + writable = softleaf_is_device_private_write(old_entry);
>
> Just to make sure I get it. This means the only possible writable
> non present/softleaf entry is device private writable. There is
> writable migration entry, but since we are setting a migration entry
> here, that should not be possible.
Yes :)
This is doing the same as try_to_migrate_one(), e.g.:
if (folio_test_hugetlb(folio)) {
...
} else if (likely(pte_present(pteval))) {
...
} else {
const softleaf_t entry = softleaf_from_pte(pteval);
pte_clear(mm, address, pvmw.pte);
writable = softleaf_is_device_private_write(entry);
}
>
> The patch LGTM. Thanks.
>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
Thanks!
>
>
> Best Regards,
> Yan, Zi
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 19+ messages in thread