The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [PATCH 0/2] mm/page_owner: fix TOCTOU races in lockless page state reading
@ 2026-06-25  1:47 Ye Liu
  2026-06-25  1:47 ` [PATCH 1/2] mm/page_owner: clamp skip_buddy_pages() PFN advance at MAX_ORDER_NR_PAGES boundary Ye Liu
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Ye Liu @ 2026-06-25  1:47 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka
  Cc: Ye Liu, Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
	Johannes Weiner, Zi Yan, linux-mm, linux-kernel

Fix two TOCTOU races found during review of [1].

page_owner reads page state locklessly by design. In two places the
code reads the same metadata twice — once as a guard, then again as
a use — and the page can be concurrently reallocated between the two:

Patch 1: buddy_order_unsafe() in skip_buddy_pages() can return garbage
if the page is allocated between PageBuddy() and the private read,
causing the PFN to skip past a pfn_valid() boundary.  Clamp the
advance at MAX_ORDER_NR_PAGES.

Patch 2: PageMemcgKmem() in print_page_owner_memcg() re-reads
folio->memcg_data and triggers VM_BUG_ON assertions if the page
became a tail page or slab page.  Use the snapshot taken at entry.

[1] https://lore.kernel.org/all/20260623065234.31866-2-ye.liu@linux.dev/
[2] https://sashiko.dev/#/patchset/20260623065234.31866-2-ye.liu@linux.dev

Ye Liu (2):
  mm/page_owner: clamp skip_buddy_pages() PFN advance at
    MAX_ORDER_NR_PAGES boundary
  mm/page_owner: use memcg_data snapshot instead of PageMemcgKmem() to
    avoid TOCTOU VM_BUG_ON

 mm/page_owner.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

--
2.43.0


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/2] mm/page_owner: clamp skip_buddy_pages() PFN advance at MAX_ORDER_NR_PAGES boundary
  2026-06-25  1:47 [PATCH 0/2] mm/page_owner: fix TOCTOU races in lockless page state reading Ye Liu
@ 2026-06-25  1:47 ` Ye Liu
  2026-06-25  1:47 ` [PATCH 2/2] mm/page_owner: use memcg_data snapshot instead of PageMemcgKmem() to avoid TOCTOU VM_BUG_ON Ye Liu
  2026-06-25  2:04 ` [PATCH 0/2] mm/page_owner: fix TOCTOU races in lockless page state reading Andrew Morton
  2 siblings, 0 replies; 4+ messages in thread
From: Ye Liu @ 2026-06-25  1:47 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka
  Cc: Ye Liu, Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
	Johannes Weiner, Zi Yan, linux-mm, linux-kernel

The lockless buddy_order_unsafe() read can return a garbage order
value if the page is concurrently allocated between the PageBuddy
check and the private read.  If this bogus order is <= MAX_PAGE_ORDER,
skip_buddy_pages() would arbitrarily advance the PFN, potentially
jumping past a MAX_ORDER_NR_PAGES boundary whose pfn_valid() check
would have caught an offline memory section.

In read_page_owner(), which relies solely on boundary-aligned
pfn_valid() to guard pfn_to_page(), skipping the boundary could
cause pfn_to_page() to access an unmapped mem_section.

Clamp the advance so it never crosses the next MAX_ORDER_NR_PAGES
boundary.  This is safe for all three callers: the pageblock-iterating
ones already handle boundary transitions in their outer loops, and
for read_page_owner() the worst case is one extra PageBuddy check per
1024 pages for a huge buddy block straddling the boundary.

Signed-off-by: Ye Liu <ye.liu@linux.dev>
---
 mm/page_owner.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/mm/page_owner.c b/mm/page_owner.c
index ec9600025127..5c403bce35ce 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -435,6 +435,12 @@ void __folio_copy_owner(struct folio *newfolio, struct folio *old)
  * to skip less than the full buddy block, but that is acceptable for page owner
  * iteration purposes.
  *
+ * The lockless read of buddy_order_unsafe() can also return a garbage order if
+ * the page is concurrently allocated and PageBuddy is cleared between the check
+ * and the read. Clamp the advance at the next MAX_ORDER_NR_PAGES boundary so
+ * that a bogus order cannot carry @pfn into an unvalidated memory section,
+ * which would break callers that rely on boundary-aligned pfn_valid() checks.
+ *
  * Return: true if the page was skipped (caller should continue its loop),
  *         false if the page is not a buddy page and should be processed normally.
  */
@@ -446,8 +452,12 @@ static inline bool skip_buddy_pages(unsigned long *pfn, struct page *page)
 		return false;
 
 	order = buddy_order_unsafe(page);
-	if (order <= MAX_PAGE_ORDER)
-		*pfn += (1UL << order) - 1;
+	if (order <= MAX_PAGE_ORDER) {
+		unsigned long new_pfn = *pfn + (1UL << order);
+		unsigned long boundary = ALIGN(*pfn + 1, MAX_ORDER_NR_PAGES);
+
+		*pfn = min(new_pfn, boundary) - 1;
+	}
 
 	return true;
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2] mm/page_owner: use memcg_data snapshot instead of PageMemcgKmem() to avoid TOCTOU VM_BUG_ON
  2026-06-25  1:47 [PATCH 0/2] mm/page_owner: fix TOCTOU races in lockless page state reading Ye Liu
  2026-06-25  1:47 ` [PATCH 1/2] mm/page_owner: clamp skip_buddy_pages() PFN advance at MAX_ORDER_NR_PAGES boundary Ye Liu
@ 2026-06-25  1:47 ` Ye Liu
  2026-06-25  2:04 ` [PATCH 0/2] mm/page_owner: fix TOCTOU races in lockless page state reading Andrew Morton
  2 siblings, 0 replies; 4+ messages in thread
From: Ye Liu @ 2026-06-25  1:47 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka
  Cc: Ye Liu, Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
	Johannes Weiner, Zi Yan, linux-mm, linux-kernel

print_page_owner_memcg() takes a snapshot of page->memcg_data via
READ_ONCE at the top of the function and guards against tail pages
and NULL memcg_data.  However, at the end it calls PageMemcgKmem(page)
which internally calls folio_memcg_kmem() — and that function re-reads
folio->memcg_data and page->compound_head locklessly, wrapping both
in VM_BUG_ON assertions:

    VM_BUG_ON_PGFLAGS(PageTail(&folio->page), &folio->page);
    VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJEXTS, folio);

If the page is concurrently freed and reallocated as a THP tail page
or a slab page between the initial guards and this final call, the
VM_BUG_ON assertions can fire on debug builds (CONFIG_DEBUG_VM=y),
causing a kernel panic.

Fix by reusing the memcg_data snapshot already taken at function entry
instead of calling PageMemcgKmem(), which is semantically equivalent:
PageMemcgKmem()->folio_memcg_kmem()->folio->memcg_data & MEMCG_DATA_KMEM.
This avoids both the TOCTOU window and the assertions entirely.

Signed-off-by: Ye Liu <ye.liu@linux.dev>
---
 mm/page_owner.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_owner.c b/mm/page_owner.c
index 5c403bce35ce..b3252ebc0307 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -568,7 +568,7 @@ static inline int print_page_owner_memcg(char *kbuf, size_t count, int ret,
 	cgroup_name(memcg->css.cgroup, name, sizeof(name));
 	ret += scnprintf(kbuf + ret, count - ret,
 			"Charged %sto %smemcg %s\n",
-			PageMemcgKmem(page) ? "(via objcg) " : "",
+			(memcg_data & MEMCG_DATA_KMEM) ? "(via objcg) " : "",
 			online ? "" : "offline ",
 			name);
 out_unlock:
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 0/2] mm/page_owner: fix TOCTOU races in lockless page state reading
  2026-06-25  1:47 [PATCH 0/2] mm/page_owner: fix TOCTOU races in lockless page state reading Ye Liu
  2026-06-25  1:47 ` [PATCH 1/2] mm/page_owner: clamp skip_buddy_pages() PFN advance at MAX_ORDER_NR_PAGES boundary Ye Liu
  2026-06-25  1:47 ` [PATCH 2/2] mm/page_owner: use memcg_data snapshot instead of PageMemcgKmem() to avoid TOCTOU VM_BUG_ON Ye Liu
@ 2026-06-25  2:04 ` Andrew Morton
  2 siblings, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2026-06-25  2:04 UTC (permalink / raw)
  To: Ye Liu
  Cc: Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Brendan Jackman, Johannes Weiner, Zi Yan, linux-mm, linux-kernel

On Thu, 25 Jun 2026 09:47:03 +0800 Ye Liu <ye.liu@linux.dev> wrote:

> Fix two TOCTOU races found during review of [1].
> 
> page_owner reads page state locklessly by design. In two places the
> code reads the same metadata twice — once as a guard, then again as
> a use — and the page can be concurrently reallocated between the two:
> 
> Patch 1: buddy_order_unsafe() in skip_buddy_pages() can return garbage
> if the page is allocated between PageBuddy() and the private read,
> causing the PFN to skip past a pfn_valid() boundary.  Clamp the
> advance at MAX_ORDER_NR_PAGES.
> 
> Patch 2: PageMemcgKmem() in print_page_owner_memcg() re-reads
> folio->memcg_data and triggers VM_BUG_ON assertions if the page
> became a tail page or slab page.  Use the snapshot taken at entry.

That was fast.  I haven't pushed out mm-new yet, so Sashiko wasn't able
to apply these.

> [1] https://lore.kernel.org/all/20260623065234.31866-2-ye.liu@linux.dev/
> [2] https://sashiko.dev/#/patchset/20260623065234.31866-2-ye.liu@linux.dev

Nothing cites "[2]".    That's OK.



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-25  2:04 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-25  1:47 [PATCH 0/2] mm/page_owner: fix TOCTOU races in lockless page state reading Ye Liu
2026-06-25  1:47 ` [PATCH 1/2] mm/page_owner: clamp skip_buddy_pages() PFN advance at MAX_ORDER_NR_PAGES boundary Ye Liu
2026-06-25  1:47 ` [PATCH 2/2] mm/page_owner: use memcg_data snapshot instead of PageMemcgKmem() to avoid TOCTOU VM_BUG_ON Ye Liu
2026-06-25  2:04 ` [PATCH 0/2] mm/page_owner: fix TOCTOU races in lockless page state reading Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox