All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] mm/page_owner: fix TOCTOU races in lockless page state reading
@ 2026-06-25  1:47 Ye Liu
  2026-06-25  1:47 ` [PATCH 1/2] mm/page_owner: clamp skip_buddy_pages() PFN advance at MAX_ORDER_NR_PAGES boundary Ye Liu
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Ye Liu @ 2026-06-25  1:47 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka
  Cc: Ye Liu, Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
	Johannes Weiner, Zi Yan, linux-mm, linux-kernel

Fix two TOCTOU races found during review of [1].

page_owner reads page state locklessly by design. In two places the
code reads the same metadata twice — once as a guard, then again as
a use — and the page can be concurrently reallocated between the two:

Patch 1: buddy_order_unsafe() in skip_buddy_pages() can return garbage
if the page is allocated between PageBuddy() and the private read,
causing the PFN to skip past a pfn_valid() boundary.  Clamp the
advance at MAX_ORDER_NR_PAGES.

Patch 2: PageMemcgKmem() in print_page_owner_memcg() re-reads
folio->memcg_data and triggers VM_BUG_ON assertions if the page
became a tail page or slab page.  Use the snapshot taken at entry.

[1] https://lore.kernel.org/all/20260623065234.31866-2-ye.liu@linux.dev/
[2] https://sashiko.dev/#/patchset/20260623065234.31866-2-ye.liu@linux.dev

Ye Liu (2):
  mm/page_owner: clamp skip_buddy_pages() PFN advance at
    MAX_ORDER_NR_PAGES boundary
  mm/page_owner: use memcg_data snapshot instead of PageMemcgKmem() to
    avoid TOCTOU VM_BUG_ON

 mm/page_owner.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

--
2.43.0



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/2] mm/page_owner: clamp skip_buddy_pages() PFN advance at MAX_ORDER_NR_PAGES boundary
  2026-06-25  1:47 [PATCH 0/2] mm/page_owner: fix TOCTOU races in lockless page state reading Ye Liu
@ 2026-06-25  1:47 ` Ye Liu
  2026-06-25  1:47 ` [PATCH 2/2] mm/page_owner: use memcg_data snapshot instead of PageMemcgKmem() to avoid TOCTOU VM_BUG_ON Ye Liu
  2026-06-25  2:04 ` [PATCH 0/2] mm/page_owner: fix TOCTOU races in lockless page state reading Andrew Morton
  2 siblings, 0 replies; 4+ messages in thread
From: Ye Liu @ 2026-06-25  1:47 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka
  Cc: Ye Liu, Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
	Johannes Weiner, Zi Yan, linux-mm, linux-kernel

The lockless buddy_order_unsafe() read can return a garbage order
value if the page is concurrently allocated between the PageBuddy
check and the private read.  If this bogus order is <= MAX_PAGE_ORDER,
skip_buddy_pages() would arbitrarily advance the PFN, potentially
jumping past a MAX_ORDER_NR_PAGES boundary whose pfn_valid() check
would have caught an offline memory section.

In read_page_owner(), which relies solely on boundary-aligned
pfn_valid() to guard pfn_to_page(), skipping the boundary could
cause pfn_to_page() to access an unmapped mem_section.

Clamp the advance so it never crosses the next MAX_ORDER_NR_PAGES
boundary.  This is safe for all three callers: the pageblock-iterating
ones already handle boundary transitions in their outer loops, and
for read_page_owner() the worst case is one extra PageBuddy check per
1024 pages for a huge buddy block straddling the boundary.

Signed-off-by: Ye Liu <ye.liu@linux.dev>
---
 mm/page_owner.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/mm/page_owner.c b/mm/page_owner.c
index ec9600025127..5c403bce35ce 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -435,6 +435,12 @@ void __folio_copy_owner(struct folio *newfolio, struct folio *old)
  * to skip less than the full buddy block, but that is acceptable for page owner
  * iteration purposes.
  *
+ * The lockless read of buddy_order_unsafe() can also return a garbage order if
+ * the page is concurrently allocated and PageBuddy is cleared between the check
+ * and the read. Clamp the advance at the next MAX_ORDER_NR_PAGES boundary so
+ * that a bogus order cannot carry @pfn into an unvalidated memory section,
+ * which would break callers that rely on boundary-aligned pfn_valid() checks.
+ *
  * Return: true if the page was skipped (caller should continue its loop),
  *         false if the page is not a buddy page and should be processed normally.
  */
@@ -446,8 +452,12 @@ static inline bool skip_buddy_pages(unsigned long *pfn, struct page *page)
 		return false;
 
 	order = buddy_order_unsafe(page);
-	if (order <= MAX_PAGE_ORDER)
-		*pfn += (1UL << order) - 1;
+	if (order <= MAX_PAGE_ORDER) {
+		unsigned long new_pfn = *pfn + (1UL << order);
+		unsigned long boundary = ALIGN(*pfn + 1, MAX_ORDER_NR_PAGES);
+
+		*pfn = min(new_pfn, boundary) - 1;
+	}
 
 	return true;
 }
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2] mm/page_owner: use memcg_data snapshot instead of PageMemcgKmem() to avoid TOCTOU VM_BUG_ON
  2026-06-25  1:47 [PATCH 0/2] mm/page_owner: fix TOCTOU races in lockless page state reading Ye Liu
  2026-06-25  1:47 ` [PATCH 1/2] mm/page_owner: clamp skip_buddy_pages() PFN advance at MAX_ORDER_NR_PAGES boundary Ye Liu
@ 2026-06-25  1:47 ` Ye Liu
  2026-06-25  2:04 ` [PATCH 0/2] mm/page_owner: fix TOCTOU races in lockless page state reading Andrew Morton
  2 siblings, 0 replies; 4+ messages in thread
From: Ye Liu @ 2026-06-25  1:47 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka
  Cc: Ye Liu, Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
	Johannes Weiner, Zi Yan, linux-mm, linux-kernel

print_page_owner_memcg() takes a snapshot of page->memcg_data via
READ_ONCE at the top of the function and guards against tail pages
and NULL memcg_data.  However, at the end it calls PageMemcgKmem(page)
which internally calls folio_memcg_kmem() — and that function re-reads
folio->memcg_data and page->compound_head locklessly, wrapping both
in VM_BUG_ON assertions:

    VM_BUG_ON_PGFLAGS(PageTail(&folio->page), &folio->page);
    VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJEXTS, folio);

If the page is concurrently freed and reallocated as a THP tail page
or a slab page between the initial guards and this final call, the
VM_BUG_ON assertions can fire on debug builds (CONFIG_DEBUG_VM=y),
causing a kernel panic.

Fix by reusing the memcg_data snapshot already taken at function entry
instead of calling PageMemcgKmem(), which is semantically equivalent:
PageMemcgKmem()->folio_memcg_kmem()->folio->memcg_data & MEMCG_DATA_KMEM.
This avoids both the TOCTOU window and the assertions entirely.

Signed-off-by: Ye Liu <ye.liu@linux.dev>
---
 mm/page_owner.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_owner.c b/mm/page_owner.c
index 5c403bce35ce..b3252ebc0307 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -568,7 +568,7 @@ static inline int print_page_owner_memcg(char *kbuf, size_t count, int ret,
 	cgroup_name(memcg->css.cgroup, name, sizeof(name));
 	ret += scnprintf(kbuf + ret, count - ret,
 			"Charged %sto %smemcg %s\n",
-			PageMemcgKmem(page) ? "(via objcg) " : "",
+			(memcg_data & MEMCG_DATA_KMEM) ? "(via objcg) " : "",
 			online ? "" : "offline ",
 			name);
 out_unlock:
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 0/2] mm/page_owner: fix TOCTOU races in lockless page state reading
  2026-06-25  1:47 [PATCH 0/2] mm/page_owner: fix TOCTOU races in lockless page state reading Ye Liu
  2026-06-25  1:47 ` [PATCH 1/2] mm/page_owner: clamp skip_buddy_pages() PFN advance at MAX_ORDER_NR_PAGES boundary Ye Liu
  2026-06-25  1:47 ` [PATCH 2/2] mm/page_owner: use memcg_data snapshot instead of PageMemcgKmem() to avoid TOCTOU VM_BUG_ON Ye Liu
@ 2026-06-25  2:04 ` Andrew Morton
  2 siblings, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2026-06-25  2:04 UTC (permalink / raw)
  To: Ye Liu
  Cc: Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Brendan Jackman, Johannes Weiner, Zi Yan, linux-mm, linux-kernel

On Thu, 25 Jun 2026 09:47:03 +0800 Ye Liu <ye.liu@linux.dev> wrote:

> Fix two TOCTOU races found during review of [1].
> 
> page_owner reads page state locklessly by design. In two places the
> code reads the same metadata twice — once as a guard, then again as
> a use — and the page can be concurrently reallocated between the two:
> 
> Patch 1: buddy_order_unsafe() in skip_buddy_pages() can return garbage
> if the page is allocated between PageBuddy() and the private read,
> causing the PFN to skip past a pfn_valid() boundary.  Clamp the
> advance at MAX_ORDER_NR_PAGES.
> 
> Patch 2: PageMemcgKmem() in print_page_owner_memcg() re-reads
> folio->memcg_data and triggers VM_BUG_ON assertions if the page
> became a tail page or slab page.  Use the snapshot taken at entry.

That was fast.  I haven't pushed out mm-new yet, so Sashiko wasn't able
to apply these.

> [1] https://lore.kernel.org/all/20260623065234.31866-2-ye.liu@linux.dev/
> [2] https://sashiko.dev/#/patchset/20260623065234.31866-2-ye.liu@linux.dev

Nothing cites "[2]".    That's OK.



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-25  2:04 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-25  1:47 [PATCH 0/2] mm/page_owner: fix TOCTOU races in lockless page state reading Ye Liu
2026-06-25  1:47 ` [PATCH 1/2] mm/page_owner: clamp skip_buddy_pages() PFN advance at MAX_ORDER_NR_PAGES boundary Ye Liu
2026-06-25  1:47 ` [PATCH 2/2] mm/page_owner: use memcg_data snapshot instead of PageMemcgKmem() to avoid TOCTOU VM_BUG_ON Ye Liu
2026-06-25  2:04 ` [PATCH 0/2] mm/page_owner: fix TOCTOU races in lockless page state reading Andrew Morton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.