* [PATCH 1/2] mm/page_owner: clamp skip_buddy_pages() PFN advance at MAX_ORDER_NR_PAGES boundary
2026-06-25 1:47 [PATCH 0/2] mm/page_owner: fix TOCTOU races in lockless page state reading Ye Liu
@ 2026-06-25 1:47 ` Ye Liu
2026-06-25 1:47 ` [PATCH 2/2] mm/page_owner: use memcg_data snapshot instead of PageMemcgKmem() to avoid TOCTOU VM_BUG_ON Ye Liu
2026-06-25 2:04 ` [PATCH 0/2] mm/page_owner: fix TOCTOU races in lockless page state reading Andrew Morton
2 siblings, 0 replies; 4+ messages in thread
From: Ye Liu @ 2026-06-25 1:47 UTC (permalink / raw)
To: Andrew Morton, Vlastimil Babka
Cc: Ye Liu, Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Johannes Weiner, Zi Yan, linux-mm, linux-kernel
The lockless buddy_order_unsafe() read can return a garbage order
value if the page is concurrently allocated between the PageBuddy
check and the private read. If this bogus order is <= MAX_PAGE_ORDER,
skip_buddy_pages() would arbitrarily advance the PFN, potentially
jumping past a MAX_ORDER_NR_PAGES boundary whose pfn_valid() check
would have caught an offline memory section.
In read_page_owner(), which relies solely on boundary-aligned
pfn_valid() to guard pfn_to_page(), skipping the boundary could
cause pfn_to_page() to access an unmapped mem_section.
Clamp the advance so it never crosses the next MAX_ORDER_NR_PAGES
boundary. This is safe for all three callers: the pageblock-iterating
ones already handle boundary transitions in their outer loops, and
for read_page_owner() the worst case is one extra PageBuddy check per
1024 pages for a huge buddy block straddling the boundary.
Signed-off-by: Ye Liu <ye.liu@linux.dev>
---
mm/page_owner.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/mm/page_owner.c b/mm/page_owner.c
index ec9600025127..5c403bce35ce 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -435,6 +435,12 @@ void __folio_copy_owner(struct folio *newfolio, struct folio *old)
* to skip less than the full buddy block, but that is acceptable for page owner
* iteration purposes.
*
+ * The lockless read of buddy_order_unsafe() can also return a garbage order if
+ * the page is concurrently allocated and PageBuddy is cleared between the check
+ * and the read. Clamp the advance at the next MAX_ORDER_NR_PAGES boundary so
+ * that a bogus order cannot carry @pfn into an unvalidated memory section,
+ * which would break callers that rely on boundary-aligned pfn_valid() checks.
+ *
* Return: true if the page was skipped (caller should continue its loop),
* false if the page is not a buddy page and should be processed normally.
*/
@@ -446,8 +452,12 @@ static inline bool skip_buddy_pages(unsigned long *pfn, struct page *page)
return false;
order = buddy_order_unsafe(page);
- if (order <= MAX_PAGE_ORDER)
- *pfn += (1UL << order) - 1;
+ if (order <= MAX_PAGE_ORDER) {
+ unsigned long new_pfn = *pfn + (1UL << order);
+ unsigned long boundary = ALIGN(*pfn + 1, MAX_ORDER_NR_PAGES);
+
+ *pfn = min(new_pfn, boundary) - 1;
+ }
return true;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread* [PATCH 2/2] mm/page_owner: use memcg_data snapshot instead of PageMemcgKmem() to avoid TOCTOU VM_BUG_ON
2026-06-25 1:47 [PATCH 0/2] mm/page_owner: fix TOCTOU races in lockless page state reading Ye Liu
2026-06-25 1:47 ` [PATCH 1/2] mm/page_owner: clamp skip_buddy_pages() PFN advance at MAX_ORDER_NR_PAGES boundary Ye Liu
@ 2026-06-25 1:47 ` Ye Liu
2026-06-25 2:04 ` [PATCH 0/2] mm/page_owner: fix TOCTOU races in lockless page state reading Andrew Morton
2 siblings, 0 replies; 4+ messages in thread
From: Ye Liu @ 2026-06-25 1:47 UTC (permalink / raw)
To: Andrew Morton, Vlastimil Babka
Cc: Ye Liu, Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Johannes Weiner, Zi Yan, linux-mm, linux-kernel
print_page_owner_memcg() takes a snapshot of page->memcg_data via
READ_ONCE at the top of the function and guards against tail pages
and NULL memcg_data. However, at the end it calls PageMemcgKmem(page)
which internally calls folio_memcg_kmem() — and that function re-reads
folio->memcg_data and page->compound_head locklessly, wrapping both
in VM_BUG_ON assertions:
VM_BUG_ON_PGFLAGS(PageTail(&folio->page), &folio->page);
VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJEXTS, folio);
If the page is concurrently freed and reallocated as a THP tail page
or a slab page between the initial guards and this final call, the
VM_BUG_ON assertions can fire on debug builds (CONFIG_DEBUG_VM=y),
causing a kernel panic.
Fix by reusing the memcg_data snapshot already taken at function entry
instead of calling PageMemcgKmem(), which is semantically equivalent:
PageMemcgKmem()->folio_memcg_kmem()->folio->memcg_data & MEMCG_DATA_KMEM.
This avoids both the TOCTOU window and the assertions entirely.
Signed-off-by: Ye Liu <ye.liu@linux.dev>
---
mm/page_owner.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 5c403bce35ce..b3252ebc0307 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -568,7 +568,7 @@ static inline int print_page_owner_memcg(char *kbuf, size_t count, int ret,
cgroup_name(memcg->css.cgroup, name, sizeof(name));
ret += scnprintf(kbuf + ret, count - ret,
"Charged %sto %smemcg %s\n",
- PageMemcgKmem(page) ? "(via objcg) " : "",
+ (memcg_data & MEMCG_DATA_KMEM) ? "(via objcg) " : "",
online ? "" : "offline ",
name);
out_unlock:
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH 0/2] mm/page_owner: fix TOCTOU races in lockless page state reading
2026-06-25 1:47 [PATCH 0/2] mm/page_owner: fix TOCTOU races in lockless page state reading Ye Liu
2026-06-25 1:47 ` [PATCH 1/2] mm/page_owner: clamp skip_buddy_pages() PFN advance at MAX_ORDER_NR_PAGES boundary Ye Liu
2026-06-25 1:47 ` [PATCH 2/2] mm/page_owner: use memcg_data snapshot instead of PageMemcgKmem() to avoid TOCTOU VM_BUG_ON Ye Liu
@ 2026-06-25 2:04 ` Andrew Morton
2 siblings, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2026-06-25 2:04 UTC (permalink / raw)
To: Ye Liu
Cc: Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Brendan Jackman, Johannes Weiner, Zi Yan, linux-mm, linux-kernel
On Thu, 25 Jun 2026 09:47:03 +0800 Ye Liu <ye.liu@linux.dev> wrote:
> Fix two TOCTOU races found during review of [1].
>
> page_owner reads page state locklessly by design. In two places the
> code reads the same metadata twice — once as a guard, then again as
> a use — and the page can be concurrently reallocated between the two:
>
> Patch 1: buddy_order_unsafe() in skip_buddy_pages() can return garbage
> if the page is allocated between PageBuddy() and the private read,
> causing the PFN to skip past a pfn_valid() boundary. Clamp the
> advance at MAX_ORDER_NR_PAGES.
>
> Patch 2: PageMemcgKmem() in print_page_owner_memcg() re-reads
> folio->memcg_data and triggers VM_BUG_ON assertions if the page
> became a tail page or slab page. Use the snapshot taken at entry.
That was fast. I haven't pushed out mm-new yet, so Sashiko wasn't able
to apply these.
> [1] https://lore.kernel.org/all/20260623065234.31866-2-ye.liu@linux.dev/
> [2] https://sashiko.dev/#/patchset/20260623065234.31866-2-ye.liu@linux.dev
Nothing cites "[2]". That's OK.
^ permalink raw reply [flat|nested] 4+ messages in thread