From: Lorenzo Stoakes <ljs@kernel.org>
To: Balbir Singh <balbirs@nvidia.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>, Zi Yan <ziy@nvidia.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
"Liam R . Howlett" <liam@infradead.org>,
Nico Pache <npache@redhat.com>,
Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
Barry Song <baohua@kernel.org>,
Lance Yang <lance.yang@linux.dev>, SeongJae Park <sj@kernel.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH mm-hotfixes] mm/huge_memory: use correct flags for device private PMD entry
Date: Tue, 2 Jun 2026 10:17:09 +0100 [thread overview]
Message-ID: <ah6byrDr-DPECvdf@lucifer> (raw)
In-Reply-To: <8f7744e0-5729-4862-b5b0-401c2bca4d50@nvidia.com>
On Tue, Jun 02, 2026 at 06:30:45AM +1000, Balbir Singh wrote:
> On 6/1/26 18:30, Lorenzo Stoakes wrote:
> > Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
> > device-private entries") updated set_pmd_migration_entry() to use
> > pmdp_huge_get_and_clear() in the softleaf case, but made no further
> > adjustments to the function itself.
> >
> > Therefore this function continues to incorrectly use pmd_write(),
> > pmd_soft_dirty() and pmd_uffd_wp() to determine whether the installed
> > migration entry should be marked writable, softdirty or uffd-wp
> > respectively.
> >
> > Whilst all are incorrect, the most problematic of these is pmd_write(), as
> > this can lead to corrupted rmap state.
> >
> > On x86-64 _PAGE_SWP_SOFT_DIRTY is aliased to _PAGE_RW. So calling
> > pmd_write() on a softleaf will return the softdirty state encoded in the
> > entry, assuming CONFIG_MEM_SOFT_DIRTY was enabled.
> >
> > This was observed when running the hmm.hmm_device_private.anon_write_child
> > selftest:
> >
> > 1. The test faults in a range then migrates it such that a device-private
> > THP range is established.
> >
> > 2. The parent then migrates it to a device-private writable PMD entry whose
> > folio is entirely AnonExclusive with entire_mapcount=1, softdirty set
> > (accidentally correct write state).
> >
> > 3. The parent forks and the PMD entries are set to device-private read only
> > entries, entire_mapcount=2, softdirty still set.
> >
> > 4. [BUG] The child writes to the range then migrates to RAM - intending to
> > install non-writable migration entries - but replacing parent and child
> > PMD mappings with WRITABLE entries due to misinterpreting the softdirty
> > bit.
> >
> > 5. In remove_migration_pmd(), if !softleaf_is_migration_read(entry) we
> > set the RMAP_EXCLUSIVE flag when calling folio_add_anon_rmap_pmd() for
> > both parent and child, which are therefore AnonExclusive.
> >
> > 6. [SPLAT] Child sets migrated folio entire_mapcount=1, parent sets
> > entire_mapcount=2 and we end up with an AnonExclusive folio with
> > entire_mapcount=2! Assert fires in __folio_add_anon_rmap():
> >
> > VM_WARN_ON_FOLIO(folio_test_large(folio) &&
> > folio_entire_mapcount(folio) > 1 &&
> > PageAnonExclusive(cur_page), folio)
> >
>
> Thanks for the explanation, I wonder why I've not run into this during
> my testing, I do have DEBUG_VM enabled in my config. I wonder if I've
> never had soft dirty set
No worries! I happened to hit it when reviewing a patch and testing
locally.
Yeah I did wonder why others didn't hit it - I guess the HMM tests are is
easily skipped if the module wasn't built for one, and perhaps either
CONFIG_DEBUG_VM not set or CONFIG_MEM_SOFT_DIRTY?
There also might be some other factor that my config happens to trigger
that others do not?
>
> > This patch fixes the issue by correctly referencing the softleaf entry
> > fields for writable, softdirty and uffd-wp in set_pmd_migration_entry().
> >
> > It also only updates A/D flags if the entry is present as these are
> > otherwise not meaningful for a softleaf entry.
> >
> > This patch also flips the if (!present) { ... } else { ... } logic in
> > set_pmd_migration_entry() so it is easier to understand, and adds some
> > comments to make things clearer.
> >
> > I was able to bisect this to commit 775465fd26a3 ("lib/test_hmm: add zone
> > device private THP test infrastructure") which first exposes this bug as it
> > was the commit that permitted test_hmm to generate the test.
> >
> > However commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
> > device-private entries") is the commit that actually enabled this
> > behaviour.
> >
> > Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
> > ---
> > mm/huge_memory.c | 45 +++++++++++++++++++++++++++++++++------------
> > 1 file changed, 33 insertions(+), 12 deletions(-)
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index bf9b480bb3b0..79463c709c98 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -4982,7 +4982,7 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
> > struct vm_area_struct *vma = pvmw->vma;
> > struct mm_struct *mm = vma->vm_mm;
> > unsigned long address = pvmw->address;
> > - bool anon_exclusive;
> > + bool anon_exclusive, present, writable, softdirty, uffd_wp;
> > pmd_t pmdval;
> > swp_entry_t entry;
> > pmd_t pmdswp;
> > @@ -4990,12 +4990,26 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
> > if (!(pvmw->pmd && !pvmw->pte))
> > return 0;
> >
> > - flush_cache_range(vma, address, address + HPAGE_PMD_SIZE);
> > - if (unlikely(!pmd_present(*pvmw->pmd)))
> > - pmdval = pmdp_huge_get_and_clear(vma->vm_mm, address, pvmw->pmd);
> > - else
> > + present = pmd_present(*pvmw->pmd);
> > + if (likely(present)) {
> > + flush_cache_range(vma, address, address + HPAGE_PMD_SIZE);
> > +
> > pmdval = pmdp_invalidate(vma, address, pvmw->pmd);
> >
> > + writable = pmd_write(pmdval);
> > + softdirty = pmd_soft_dirty(pmdval);
> > + uffd_wp = pmd_uffd_wp(pmdval);
> > + } else {
> > + softleaf_t old_entry;
> > +
> > + pmdval = pmdp_huge_get_and_clear(vma->vm_mm, address, pvmw->pmd);
> > + old_entry = softleaf_from_pmd(pmdval);
> > +
> > + writable = softleaf_is_device_private_write(old_entry);
> > + softdirty = pmd_swp_soft_dirty(pmdval);
> > + uffd_wp = pmd_swp_uffd_wp(pmdval);
> > + }
> > +
> > /* See folio_try_share_anon_rmap_pmd(): invalidate PMD first. */
> > anon_exclusive = folio_test_anon(folio) && PageAnonExclusive(page);
> > if (anon_exclusive && folio_try_share_anon_rmap_pmd(folio, page)) {
> > @@ -5003,24 +5017,31 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
> > return -EBUSY;
> > }
> >
> > - if (pmd_dirty(pmdval))
> > - folio_mark_dirty(folio);
> > - if (pmd_write(pmdval))
> > + /* Determine type of migration entry. */
> > + if (writable)
> > entry = make_writable_migration_entry(page_to_pfn(page));
> > else if (anon_exclusive)
> > entry = make_readable_exclusive_migration_entry(page_to_pfn(page));
> > else
> > entry = make_readable_migration_entry(page_to_pfn(page));
> > - if (pmd_young(pmdval))
> > +
> > + /* Set A/D bits as necessary. */
> > + if (present && pmd_young(pmdval))
> > entry = make_migration_entry_young(entry);
> > - if (pmd_dirty(pmdval))
> > + if (present && pmd_dirty(pmdval)) {
> > + folio_mark_dirty(folio);
> > entry = make_migration_entry_dirty(entry);
> > + }
> > +
> > + /* Set PMD. */
> > pmdswp = swp_entry_to_pmd(entry);
> > - if (pmd_soft_dirty(pmdval))
> > + if (softdirty)
> > pmdswp = pmd_swp_mksoft_dirty(pmdswp);
> > - if (pmd_uffd_wp(pmdval))
> > + if (uffd_wp)
> > pmdswp = pmd_swp_mkuffd_wp(pmdswp);
> > set_pmd_at(mm, address, pvmw->pmd, pmdswp);
> > +
> > + /* Migration entry installed: cleanup rmap, folio. */
> > folio_remove_rmap_pmd(folio, page, vma);
> > folio_put(folio);
> > trace_set_migration_pmd(address, pmd_val(pmdswp));
> > --
>
>
> Reviewed-by: Balbir Singh <balbirs@nvidia.com>
Thanks!
>
>
Cheers, Lorenzo
next prev parent reply other threads:[~2026-06-02 9:17 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-01 8:30 [PATCH mm-hotfixes] mm/huge_memory: use correct flags for device private PMD entry Lorenzo Stoakes
2026-06-01 8:55 ` Lorenzo Stoakes
2026-06-01 12:17 ` David Hildenbrand (Arm)
2026-06-01 15:50 ` Dev Jain
2026-06-01 15:56 ` David Hildenbrand (Arm)
2026-06-01 16:03 ` Lorenzo Stoakes
2026-06-01 16:01 ` Lorenzo Stoakes
2026-06-01 16:27 ` Dev Jain
2026-06-01 16:44 ` Dev Jain
2026-06-05 10:07 ` Lorenzo Stoakes
2026-06-05 13:22 ` Dev Jain
2026-06-01 20:30 ` Balbir Singh
2026-06-02 9:17 ` Lorenzo Stoakes [this message]
2026-06-02 3:29 ` Baolin Wang
2026-06-02 4:09 ` Oscar Salvador (SUSE)
2026-06-02 4:38 ` Barry Song
2026-06-02 6:32 ` Lance Yang
2026-06-02 14:40 ` Zi Yan
2026-06-02 17:26 ` Lorenzo Stoakes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ah6byrDr-DPECvdf@lucifer \
--to=ljs@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=balbirs@nvidia.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=lance.yang@linux.dev \
--cc=liam@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npache@redhat.com \
--cc=ryan.roberts@arm.com \
--cc=sj@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox