From: Lorenzo Stoakes <ljs@kernel.org>
To: Zi Yan <ziy@nvidia.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
"Liam R . Howlett" <liam@infradead.org>,
Nico Pache <npache@redhat.com>,
Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
Barry Song <baohua@kernel.org>,
Lance Yang <lance.yang@linux.dev>, SeongJae Park <sj@kernel.org>,
Balbir Singh <balbirs@nvidia.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH mm-hotfixes] mm/huge_memory: use correct flags for device private PMD entry
Date: Tue, 2 Jun 2026 18:26:46 +0100 [thread overview]
Message-ID: <ah8KjaYmAIqR8s5k@lucifer> (raw)
In-Reply-To: <263FB5F0-AA3C-4885-86E2-9EDB030A0CDF@nvidia.com>
On Tue, Jun 02, 2026 at 10:40:16AM -0400, Zi Yan wrote:
> On 1 Jun 2026, at 4:30, Lorenzo Stoakes wrote:
>
> > Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
> > device-private entries") updated set_pmd_migration_entry() to use
> > pmdp_huge_get_and_clear() in the softleaf case, but made no further
> > adjustments to the function itself.
> >
> > Therefore this function continues to incorrectly use pmd_write(),
> > pmd_soft_dirty() and pmd_uffd_wp() to determine whether the installed
> > migration entry should be marked writable, softdirty or uffd-wp
> > respectively.
> >
> > Whilst all are incorrect, the most problematic of these is pmd_write(), as
> > this can lead to corrupted rmap state.
> >
> > On x86-64 _PAGE_SWP_SOFT_DIRTY is aliased to _PAGE_RW. So calling
> > pmd_write() on a softleaf will return the softdirty state encoded in the
> > entry, assuming CONFIG_MEM_SOFT_DIRTY was enabled.
> >
> > This was observed when running the hmm.hmm_device_private.anon_write_child
> > selftest:
> >
> > 1. The test faults in a range then migrates it such that a device-private
> > THP range is established.
> >
> > 2. The parent then migrates it to a device-private writable PMD entry whose
> > folio is entirely AnonExclusive with entire_mapcount=1, softdirty set
> > (accidentally correct write state).
> >
> > 3. The parent forks and the PMD entries are set to device-private read only
> > entries, entire_mapcount=2, softdirty still set.
> >
> > 4. [BUG] The child writes to the range then migrates to RAM - intending to
> > install non-writable migration entries - but replacing parent and child
> > PMD mappings with WRITABLE entries due to misinterpreting the softdirty
> > bit.
> >
> > 5. In remove_migration_pmd(), if !softleaf_is_migration_read(entry) we
> > set the RMAP_EXCLUSIVE flag when calling folio_add_anon_rmap_pmd() for
> > both parent and child, which are therefore AnonExclusive.
> >
> > 6. [SPLAT] Child sets migrated folio entire_mapcount=1, parent sets
> > entire_mapcount=2 and we end up with an AnonExclusive folio with
> > entire_mapcount=2! Assert fires in __folio_add_anon_rmap():
> >
> > VM_WARN_ON_FOLIO(folio_test_large(folio) &&
> > folio_entire_mapcount(folio) > 1 &&
> > PageAnonExclusive(cur_page), folio)
> >
> > This patch fixes the issue by correctly referencing the softleaf entry
> > fields for writable, softdirty and uffd-wp in set_pmd_migration_entry().
> >
> > It also only updates A/D flags if the entry is present as these are
> > otherwise not meaningful for a softleaf entry.
> >
> > This patch also flips the if (!present) { ... } else { ... } logic in
> > set_pmd_migration_entry() so it is easier to understand, and adds some
> > comments to make things clearer.
> >
> > I was able to bisect this to commit 775465fd26a3 ("lib/test_hmm: add zone
> > device private THP test infrastructure") which first exposes this bug as it
> > was the commit that permitted test_hmm to generate the test.
> >
> > However commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
> > device-private entries") is the commit that actually enabled this
> > behaviour.
>
> Thanks for the detailed explanation.
> >
> > Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
> > ---
> > mm/huge_memory.c | 45 +++++++++++++++++++++++++++++++++------------
> > 1 file changed, 33 insertions(+), 12 deletions(-)
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index bf9b480bb3b0..79463c709c98 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -4982,7 +4982,7 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
> > struct vm_area_struct *vma = pvmw->vma;
> > struct mm_struct *mm = vma->vm_mm;
> > unsigned long address = pvmw->address;
> > - bool anon_exclusive;
> > + bool anon_exclusive, present, writable, softdirty, uffd_wp;
> > pmd_t pmdval;
> > swp_entry_t entry;
> > pmd_t pmdswp;
> > @@ -4990,12 +4990,26 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
> > if (!(pvmw->pmd && !pvmw->pte))
> > return 0;
> >
> > - flush_cache_range(vma, address, address + HPAGE_PMD_SIZE);
> > - if (unlikely(!pmd_present(*pvmw->pmd)))
> > - pmdval = pmdp_huge_get_and_clear(vma->vm_mm, address, pvmw->pmd);
> > - else
> > + present = pmd_present(*pvmw->pmd);
> > + if (likely(present)) {
> > + flush_cache_range(vma, address, address + HPAGE_PMD_SIZE);
> > +
> > pmdval = pmdp_invalidate(vma, address, pvmw->pmd);
> >
> > + writable = pmd_write(pmdval);
> > + softdirty = pmd_soft_dirty(pmdval);
> > + uffd_wp = pmd_uffd_wp(pmdval);
> > + } else {
> > + softleaf_t old_entry;
> > +
> > + pmdval = pmdp_huge_get_and_clear(vma->vm_mm, address, pvmw->pmd);
> > + old_entry = softleaf_from_pmd(pmdval);
> > +
> > + writable = softleaf_is_device_private_write(old_entry);
>
> Just to make sure I get it. This means the only possible writable
> non present/softleaf entry is device private writable. There is
> writable migration entry, but since we are setting a migration entry
> here, that should not be possible.
Yes :)
This is doing the same as try_to_migrate_one(), e.g.:
if (folio_test_hugetlb(folio)) {
...
} else if (likely(pte_present(pteval))) {
...
} else {
const softleaf_t entry = softleaf_from_pte(pteval);
pte_clear(mm, address, pvmw.pte);
writable = softleaf_is_device_private_write(entry);
}
>
> The patch LGTM. Thanks.
>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
Thanks!
>
>
> Best Regards,
> Yan, Zi
Cheers, Lorenzo
prev parent reply other threads:[~2026-06-02 17:26 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-01 8:30 [PATCH mm-hotfixes] mm/huge_memory: use correct flags for device private PMD entry Lorenzo Stoakes
2026-06-01 8:55 ` Lorenzo Stoakes
2026-06-01 12:17 ` David Hildenbrand (Arm)
2026-06-01 15:50 ` Dev Jain
2026-06-01 15:56 ` David Hildenbrand (Arm)
2026-06-01 16:03 ` Lorenzo Stoakes
2026-06-01 16:01 ` Lorenzo Stoakes
2026-06-01 16:27 ` Dev Jain
2026-06-01 16:44 ` Dev Jain
2026-06-05 10:07 ` Lorenzo Stoakes
2026-06-05 13:22 ` Dev Jain
2026-06-01 20:30 ` Balbir Singh
2026-06-02 9:17 ` Lorenzo Stoakes
2026-06-02 3:29 ` Baolin Wang
2026-06-02 4:09 ` Oscar Salvador (SUSE)
2026-06-02 4:38 ` Barry Song
2026-06-02 6:32 ` Lance Yang
2026-06-02 14:40 ` Zi Yan
2026-06-02 17:26 ` Lorenzo Stoakes [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ah8KjaYmAIqR8s5k@lucifer \
--to=ljs@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=balbirs@nvidia.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=lance.yang@linux.dev \
--cc=liam@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npache@redhat.com \
--cc=ryan.roberts@arm.com \
--cc=sj@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.