From: Andrew Morton <akpm@linux-foundation.org>
To: mm-commits@vger.kernel.org,stable@vger.kernel.org,ryan.roberts@arm.com,david@redhat.com,arkamar@atlas.cz,akpm@linux-foundation.org
Subject: [merged mm-hotfixes-stable] mm-fix-folio_pte_batch-on-xen-pv.patch removed from -mm tree
Date: Wed, 07 May 2025 23:41:08 -0700 [thread overview]
Message-ID: <20250508064109.12711C4CEEB@smtp.kernel.org> (raw)
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 6450 bytes --]
The quilt patch titled
Subject: mm: fix folio_pte_batch() on XEN PV
has been removed from the -mm tree. Its filename was
mm-fix-folio_pte_batch-on-xen-pv.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Petr Vaněk <arkamar@atlas.cz>
Subject: mm: fix folio_pte_batch() on XEN PV
Date: Fri, 2 May 2025 23:50:19 +0200
On XEN PV, folio_pte_batch() can incorrectly batch beyond the end of a
folio due to a corner case in pte_advance_pfn(). Specifically, when the
PFN following the folio maps to an invalidated MFN,
expected_pte = pte_advance_pfn(expected_pte, nr);
produces a pte_none(). If the actual next PTE in memory is also
pte_none(), the pte_same() succeeds,
if (!pte_same(pte, expected_pte))
break;
the loop is not broken, and batching continues into unrelated memory.
For example, with a 4-page folio, the PTE layout might look like this:
[ 53.465673] [ T2552] folio_pte_batch: printing PTE values at addr=0x7f1ac9dc5000
[ 53.465674] [ T2552] PTE[453] = 000000010085c125
[ 53.465679] [ T2552] PTE[454] = 000000010085d125
[ 53.465682] [ T2552] PTE[455] = 000000010085e125
[ 53.465684] [ T2552] PTE[456] = 000000010085f125
[ 53.465686] [ T2552] PTE[457] = 0000000000000000 <-- not present
[ 53.465689] [ T2552] PTE[458] = 0000000101da7125
pte_advance_pfn(PTE[456]) returns a pte_none() due to invalid PFN->MFN
mapping. The next actual PTE (PTE[457]) is also pte_none(), so the loop
continues and includes PTE[457] in the batch, resulting in 5 batched
entries for a 4-page folio. This triggers the following warning:
[ 53.465751] [ T2552] page: refcount:85 mapcount:20 mapping:ffff88813ff4f6a8 index:0x110 pfn:0x10085c
[ 53.465754] [ T2552] head: order:2 mapcount:80 entire_mapcount:0 nr_pages_mapped:4 pincount:0
[ 53.465756] [ T2552] memcg:ffff888003573000
[ 53.465758] [ T2552] aops:0xffffffff8226fd20 ino:82467c dentry name(?):"libc.so.6"
[ 53.465761] [ T2552] flags: 0x2000000000416c(referenced|uptodate|lru|active|private|head|node=0|zone=2)
[ 53.465764] [ T2552] raw: 002000000000416c ffffea0004021f08 ffffea0004021908 ffff88813ff4f6a8
[ 53.465767] [ T2552] raw: 0000000000000110 ffff888133d8bd40 0000005500000013 ffff888003573000
[ 53.465768] [ T2552] head: 002000000000416c ffffea0004021f08 ffffea0004021908 ffff88813ff4f6a8
[ 53.465770] [ T2552] head: 0000000000000110 ffff888133d8bd40 0000005500000013 ffff888003573000
[ 53.465772] [ T2552] head: 0020000000000202 ffffea0004021701 000000040000004f 00000000ffffffff
[ 53.465774] [ T2552] head: 0000000300000003 8000000300000002 0000000000000013 0000000000000004
[ 53.465775] [ T2552] page dumped because: VM_WARN_ON_FOLIO((_Generic((page + nr_pages - 1), const struct page *: (const struct folio *)_compound_head(page + nr_pages - 1), struct page *: (struct folio *)_compound_head(page + nr_pages - 1))) != folio)
Original code works as expected everywhere, except on XEN PV, where
pte_advance_pfn() can yield a pte_none() after balloon inflation due to
MFNs invalidation. In XEN, pte_advance_pfn() ends up calling
__pte()->xen_make_pte()->pte_pfn_to_mfn(), which returns pte_none() when
mfn == INVALID_P2M_ENTRY.
The pte_pfn_to_mfn() documents that nastiness:
If there's no mfn for the pfn, then just create an
empty non-present pte. Unfortunately this loses
information about the original pfn, so
pte_mfn_to_pfn is asymmetric.
While such hacks should certainly be removed, we can do better in
folio_pte_batch() and simply check ahead of time how many PTEs we can
possibly batch in our folio.
This way, we can not only fix the issue but cleanup the code: removing the
pte_pfn() check inside the loop body and avoiding end_ptr comparison +
arithmetic.
Link: https://lkml.kernel.org/r/20250502215019.822-2-arkamar@atlas.cz
Fixes: f8d937761d65 ("mm/memory: optimize fork() with PTE-mapped THP")
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Petr Vaněk <arkamar@atlas.cz>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/internal.h | 27 +++++++++++----------------
1 file changed, 11 insertions(+), 16 deletions(-)
--- a/mm/internal.h~mm-fix-folio_pte_batch-on-xen-pv
+++ a/mm/internal.h
@@ -248,11 +248,9 @@ static inline int folio_pte_batch(struct
pte_t *start_ptep, pte_t pte, int max_nr, fpb_t flags,
bool *any_writable, bool *any_young, bool *any_dirty)
{
- unsigned long folio_end_pfn = folio_pfn(folio) + folio_nr_pages(folio);
- const pte_t *end_ptep = start_ptep + max_nr;
pte_t expected_pte, *ptep;
bool writable, young, dirty;
- int nr;
+ int nr, cur_nr;
if (any_writable)
*any_writable = false;
@@ -265,11 +263,15 @@ static inline int folio_pte_batch(struct
VM_WARN_ON_FOLIO(!folio_test_large(folio) || max_nr < 1, folio);
VM_WARN_ON_FOLIO(page_folio(pfn_to_page(pte_pfn(pte))) != folio, folio);
+ /* Limit max_nr to the actual remaining PFNs in the folio we could batch. */
+ max_nr = min_t(unsigned long, max_nr,
+ folio_pfn(folio) + folio_nr_pages(folio) - pte_pfn(pte));
+
nr = pte_batch_hint(start_ptep, pte);
expected_pte = __pte_batch_clear_ignored(pte_advance_pfn(pte, nr), flags);
ptep = start_ptep + nr;
- while (ptep < end_ptep) {
+ while (nr < max_nr) {
pte = ptep_get(ptep);
if (any_writable)
writable = !!pte_write(pte);
@@ -282,14 +284,6 @@ static inline int folio_pte_batch(struct
if (!pte_same(pte, expected_pte))
break;
- /*
- * Stop immediately once we reached the end of the folio. In
- * corner cases the next PFN might fall into a different
- * folio.
- */
- if (pte_pfn(pte) >= folio_end_pfn)
- break;
-
if (any_writable)
*any_writable |= writable;
if (any_young)
@@ -297,12 +291,13 @@ static inline int folio_pte_batch(struct
if (any_dirty)
*any_dirty |= dirty;
- nr = pte_batch_hint(ptep, pte);
- expected_pte = pte_advance_pfn(expected_pte, nr);
- ptep += nr;
+ cur_nr = pte_batch_hint(ptep, pte);
+ expected_pte = pte_advance_pfn(expected_pte, cur_nr);
+ ptep += cur_nr;
+ nr += cur_nr;
}
- return min(ptep - start_ptep, max_nr);
+ return min(nr, max_nr);
}
/**
_
Patches currently in -mm which might be from arkamar@atlas.cz are
reply other threads:[~2025-05-08 6:41 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250508064109.12711C4CEEB@smtp.kernel.org \
--to=akpm@linux-foundation.org \
--cc=arkamar@atlas.cz \
--cc=david@redhat.com \
--cc=mm-commits@vger.kernel.org \
--cc=ryan.roberts@arm.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.