From: Lorenzo Stoakes <ljs@kernel.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@kernel.org>, Zi Yan <ziy@nvidia.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
"Liam R . Howlett" <liam@infradead.org>,
Nico Pache <npache@redhat.com>,
Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
Barry Song <baohua@kernel.org>, Lance Yang <lance.yang@linux.dev>,
SeongJae Park <sj@kernel.org>, Balbir Singh <balbirs@nvidia.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: [PATCH mm-hotfixes] mm/huge_memory: use correct flags for device private PMD entry
Date: Mon, 1 Jun 2026 09:30:44 +0100 [thread overview]
Message-ID: <20260601083044.57132-1-ljs@kernel.org> (raw)
Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
device-private entries") updated set_pmd_migration_entry() to use
pmdp_huge_get_and_clear() in the softleaf case, but made no further
adjustments to the function itself.
Therefore this function continues to incorrectly use pmd_write(),
pmd_soft_dirty() and pmd_uffd_wp() to determine whether the installed
migration entry should be marked writable, softdirty or uffd-wp
respectively.
Whilst all are incorrect, the most problematic of these is pmd_write(), as
this can lead to corrupted rmap state.
On x86-64 _PAGE_SWP_SOFT_DIRTY is aliased to _PAGE_RW. So calling
pmd_write() on a softleaf will return the softdirty state encoded in the
entry, assuming CONFIG_MEM_SOFT_DIRTY was enabled.
This was observed when running the hmm.hmm_device_private.anon_write_child
selftest:
1. The test faults in a range then migrates it such that a device-private
THP range is established.
2. The parent then migrates it to a device-private writable PMD entry whose
folio is entirely AnonExclusive with entire_mapcount=1, softdirty set
(accidentally correct write state).
3. The parent forks and the PMD entries are set to device-private read only
entries, entire_mapcount=2, softdirty still set.
4. [BUG] The child writes to the range then migrates to RAM - intending to
install non-writable migration entries - but replacing parent and child
PMD mappings with WRITABLE entries due to misinterpreting the softdirty
bit.
5. In remove_migration_pmd(), if !softleaf_is_migration_read(entry) we
set the RMAP_EXCLUSIVE flag when calling folio_add_anon_rmap_pmd() for
both parent and child, which are therefore AnonExclusive.
6. [SPLAT] Child sets migrated folio entire_mapcount=1, parent sets
entire_mapcount=2 and we end up with an AnonExclusive folio with
entire_mapcount=2! Assert fires in __folio_add_anon_rmap():
VM_WARN_ON_FOLIO(folio_test_large(folio) &&
folio_entire_mapcount(folio) > 1 &&
PageAnonExclusive(cur_page), folio)
This patch fixes the issue by correctly referencing the softleaf entry
fields for writable, softdirty and uffd-wp in set_pmd_migration_entry().
It also only updates A/D flags if the entry is present as these are
otherwise not meaningful for a softleaf entry.
This patch also flips the if (!present) { ... } else { ... } logic in
set_pmd_migration_entry() so it is easier to understand, and adds some
comments to make things clearer.
I was able to bisect this to commit 775465fd26a3 ("lib/test_hmm: add zone
device private THP test infrastructure") which first exposes this bug as it
was the commit that permitted test_hmm to generate the test.
However commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
device-private entries") is the commit that actually enabled this
behaviour.
Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
Cc: stable@vger.kernel.org
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
---
mm/huge_memory.c | 45 +++++++++++++++++++++++++++++++++------------
1 file changed, 33 insertions(+), 12 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index bf9b480bb3b0..79463c709c98 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -4982,7 +4982,7 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
struct vm_area_struct *vma = pvmw->vma;
struct mm_struct *mm = vma->vm_mm;
unsigned long address = pvmw->address;
- bool anon_exclusive;
+ bool anon_exclusive, present, writable, softdirty, uffd_wp;
pmd_t pmdval;
swp_entry_t entry;
pmd_t pmdswp;
@@ -4990,12 +4990,26 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
if (!(pvmw->pmd && !pvmw->pte))
return 0;
- flush_cache_range(vma, address, address + HPAGE_PMD_SIZE);
- if (unlikely(!pmd_present(*pvmw->pmd)))
- pmdval = pmdp_huge_get_and_clear(vma->vm_mm, address, pvmw->pmd);
- else
+ present = pmd_present(*pvmw->pmd);
+ if (likely(present)) {
+ flush_cache_range(vma, address, address + HPAGE_PMD_SIZE);
+
pmdval = pmdp_invalidate(vma, address, pvmw->pmd);
+ writable = pmd_write(pmdval);
+ softdirty = pmd_soft_dirty(pmdval);
+ uffd_wp = pmd_uffd_wp(pmdval);
+ } else {
+ softleaf_t old_entry;
+
+ pmdval = pmdp_huge_get_and_clear(vma->vm_mm, address, pvmw->pmd);
+ old_entry = softleaf_from_pmd(pmdval);
+
+ writable = softleaf_is_device_private_write(old_entry);
+ softdirty = pmd_swp_soft_dirty(pmdval);
+ uffd_wp = pmd_swp_uffd_wp(pmdval);
+ }
+
/* See folio_try_share_anon_rmap_pmd(): invalidate PMD first. */
anon_exclusive = folio_test_anon(folio) && PageAnonExclusive(page);
if (anon_exclusive && folio_try_share_anon_rmap_pmd(folio, page)) {
@@ -5003,24 +5017,31 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
return -EBUSY;
}
- if (pmd_dirty(pmdval))
- folio_mark_dirty(folio);
- if (pmd_write(pmdval))
+ /* Determine type of migration entry. */
+ if (writable)
entry = make_writable_migration_entry(page_to_pfn(page));
else if (anon_exclusive)
entry = make_readable_exclusive_migration_entry(page_to_pfn(page));
else
entry = make_readable_migration_entry(page_to_pfn(page));
- if (pmd_young(pmdval))
+
+ /* Set A/D bits as necessary. */
+ if (present && pmd_young(pmdval))
entry = make_migration_entry_young(entry);
- if (pmd_dirty(pmdval))
+ if (present && pmd_dirty(pmdval)) {
+ folio_mark_dirty(folio);
entry = make_migration_entry_dirty(entry);
+ }
+
+ /* Set PMD. */
pmdswp = swp_entry_to_pmd(entry);
- if (pmd_soft_dirty(pmdval))
+ if (softdirty)
pmdswp = pmd_swp_mksoft_dirty(pmdswp);
- if (pmd_uffd_wp(pmdval))
+ if (uffd_wp)
pmdswp = pmd_swp_mkuffd_wp(pmdswp);
set_pmd_at(mm, address, pvmw->pmd, pmdswp);
+
+ /* Migration entry installed: cleanup rmap, folio. */
folio_remove_rmap_pmd(folio, page, vma);
folio_put(folio);
trace_set_migration_pmd(address, pmd_val(pmdswp));
--
2.54.0
next reply other threads:[~2026-06-01 8:30 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-01 8:30 Lorenzo Stoakes [this message]
2026-06-01 8:55 ` [PATCH mm-hotfixes] mm/huge_memory: use correct flags for device private PMD entry Lorenzo Stoakes
2026-06-01 12:17 ` David Hildenbrand (Arm)
2026-06-01 15:50 ` Dev Jain
2026-06-01 15:56 ` David Hildenbrand (Arm)
2026-06-01 16:03 ` Lorenzo Stoakes
2026-06-01 16:01 ` Lorenzo Stoakes
2026-06-01 16:27 ` Dev Jain
2026-06-01 16:44 ` Dev Jain
2026-06-05 10:07 ` Lorenzo Stoakes
2026-06-05 13:22 ` Dev Jain
2026-06-01 20:30 ` Balbir Singh
2026-06-02 9:17 ` Lorenzo Stoakes
2026-06-02 3:29 ` Baolin Wang
2026-06-02 4:09 ` Oscar Salvador (SUSE)
2026-06-02 4:38 ` Barry Song
2026-06-02 6:32 ` Lance Yang
2026-06-02 14:40 ` Zi Yan
2026-06-02 17:26 ` Lorenzo Stoakes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260601083044.57132-1-ljs@kernel.org \
--to=ljs@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=balbirs@nvidia.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=lance.yang@linux.dev \
--cc=liam@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npache@redhat.com \
--cc=ryan.roberts@arm.com \
--cc=sj@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.