From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0BE2B3A3E82 for ; Thu, 30 Apr 2026 20:22:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=96.67.55.147 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777580572; cv=none; b=Z7EFvAoFu1UWilCSZpt2mYrc3kwjq1uzVwOxlQYrJi0Cu6WTytmiIvf8zajdPaHd5YvXaoN6FhBU9Z/0UINEjmrB5dYU0sgKl36LYYG+Z/eXS++eZhYBM60e1BrVCgAW4GuahJJ1NEWaT3P7P9GKn9gDbwzUCqz/lL0tiMrYngA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777580572; c=relaxed/simple; bh=OM9JhRx71D/4y31d7JNfSFgNyePgYmYJtXXEmDO9eXQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mee7vjASOdPwHVDQMYXN1Gv6SHa3orG6o/vQqSdz8FaGf46Lrmqy7/K8Ti7Kre3mOT62icq7a3xpZZ0b/k89HRPAfkwimio5SzcQFznU7YJzPZfZAAhNyTr246asTczPZUlHN/Qz16Vmm1ud8rGqL1JMRj2TU/I91g/OVp7IaoA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com; spf=pass smtp.mailfrom=surriel.com; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b=LSHH5xXD; arc=none smtp.client-ip=96.67.55.147 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=surriel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b="LSHH5xXD" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=surriel.com ; s=mail; h=Content-Transfer-Encoding:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=t95jkgjtdoG7RS+EwSENqfqyNRQXMA6tpyzb09QD3W8=; b=LSHH5xXDfd1I3pGugNc/LyOD59 ehFLDdP8ZQgsolVr3SMkMyI3eB0Jhz6emI/seBixVn6fxPzPGPgw5591y9BsX/ALYuHDGoI6h7QoF A3S44ifwLjCYmN9pl0GpIJdBf7JPapYhl7LeT1FtFgjv/kFKKa0ojddAbwzL4bPrwMZVsHczYHfUq uN3NoC4MRzzRmibPBhO1l6MdsCu8wxxLfTRixQp7ElujI56/TBl/WPyW/qFfqIW66vgD47NNsnabA yVjO55EZIOpa3BHwR4i0pysAssjzvs/FRVs4sjDybRQzUZMbkU/0e38wVmHhLrtTfoiI6X1DqwD5H J1mUcpyg==; Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1wIXuC-000000001R0-3jZ7; Thu, 30 Apr 2026 16:22:40 -0400 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: kernel-team@meta.com, linux-mm@kvack.org, david@kernel.org, willy@infradead.org, surenb@google.com, hannes@cmpxchg.org, ljs@kernel.org, ziy@nvidia.com, usama.arif@linux.dev, Rik van Riel , Rik van Riel Subject: [RFC PATCH 19/45] mm: page_alloc: prevent atomic allocations from tainting clean SPBs Date: Thu, 30 Apr 2026 16:20:48 -0400 Message-ID: <20260430202233.111010-20-riel@surriel.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260430202233.111010-1-riel@surriel.com> References: <20260430202233.111010-1-riel@surriel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: Rik van Riel Non-DIRECT_RECLAIM (atomic) allocations that fail with ALLOC_NOFRAGMENT previously dropped the flag entirely and retried, allowing them to taint clean superpageblocks. This was the primary source of taint spreading observed on production systems. Two changes to keep atomic allocations within tainted SPBs: 1. Extend Pass 2 in __rmqueue_smallest with a sub-pageblock phase (Pass 2b). The original Pass 2 only finds whole free pageblocks (>= pageblock order) in tainted SPBs. Pass 2b searches for sub-pageblock-order free blocks and uses try_to_claim_block to claim the pageblock if it has enough compatible pages. This finds pages in tainted SPBs that have fragmented free space but no whole free pageblocks. 2. Add ALLOC_NOFRAG_TAINTED_OK intermediate flag. Instead of going directly from ALLOC_NOFRAGMENT to no protection, atomic allocations first try with ALLOC_NOFRAG_TAINTED_OK which allows __rmqueue_steal to search tainted SPBs only. Clean/empty SPBs remain protected. Only if steal from tainted SPBs also fails is ALLOC_NOFRAGMENT fully dropped as a last resort. Signed-off-by: Rik van Riel Assisted-by: Claude:claude-opus-4.7 syzkaller --- mm/internal.h | 1 + mm/page_alloc.c | 87 +++++++++++++++++++++++++++++++++++++++++++++---- 2 files changed, 81 insertions(+), 7 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 02f1c7d36b85..f641795688af 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1413,6 +1413,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone, #define ALLOC_HIGHATOMIC 0x200 /* Allows access to MIGRATE_HIGHATOMIC */ #define ALLOC_TRYLOCK 0x400 /* Only use spin_trylock in allocation path */ #define ALLOC_KSWAPD 0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */ +#define ALLOC_NOFRAG_TAINTED_OK 0x1000 /* NOFRAGMENT, but allow steal from tainted SPBs */ /* Flags that allow allocations below the min watermark. */ #define ALLOC_RESERVES (ALLOC_NON_BLOCK|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|ALLOC_OOM) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8ce96db50c2f..13bc57592cd5 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2713,6 +2713,9 @@ static struct page *__rmqueue_from_sb(struct zone *zone, unsigned int order, */ static struct page *claim_whole_block(struct zone *zone, struct page *page, int current_order, int order, int new_type, int old_type); +static struct page *try_to_claim_block(struct zone *zone, struct page *page, + int current_order, int order, int start_type, + int block_type, unsigned int alloc_flags); static __always_inline struct page *__rmqueue_smallest(struct zone *zone, unsigned int order, @@ -2782,6 +2785,11 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order, * free list (reset by mark_pageblock_free), so the search above * misses them. Claim them inline to keep non-movable allocations * concentrated in already-tainted superpageblocks. + * + * Try whole pageblock orders first (preferred for PCP buddy optimization), + * then fall back to sub-pageblock orders. Sub-pageblock claiming uses + * try_to_claim_block which checks whether the pageblock has enough + * compatible pages to justify claiming it. */ if (!movable && !is_migrate_cma(migratetype)) { for (full = SB_FULL; full < __NR_SB_FULLNESS; full++) { @@ -2814,6 +2822,43 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order, } } } + /* Pass 2b: sub-pageblock orders in tainted SPBs */ + for (full = SB_FULL; full < __NR_SB_FULLNESS; full++) { + list_for_each_entry(sb, + &zone->spb_lists[SB_TAINTED][full], list) { + int co; + + if (!sb->nr_free_pages) + continue; + for (co = min_t(int, pageblock_order - 1, + NR_PAGE_ORDERS - 1); + co >= (int)order; + --co) { + current_order = co; + area = &sb->free_area[current_order]; + page = get_page_from_free_area( + area, MIGRATE_MOVABLE); + if (!page) + continue; + if (get_pageblock_isolate(page)) + continue; + if (is_migrate_cma( + get_pageblock_migratetype(page))) + continue; + page = try_to_claim_block(zone, page, + current_order, order, + migratetype, MIGRATE_MOVABLE, + 0); + if (!page) + continue; + trace_mm_page_alloc_zone_locked( + page, order, migratetype, + pcp_allowed_order(order) && + migratetype < MIGRATE_PCPTYPES); + return page; + } + } + } } /* Empty superpageblocks: try before falling back to non-preferred category */ @@ -3566,12 +3611,23 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype, * the block as its current migratetype, potentially causing fragmentation. */ static __always_inline struct page * -__rmqueue_steal(struct zone *zone, int order, int start_migratetype) +__rmqueue_steal(struct zone *zone, int order, int start_migratetype, + unsigned int alloc_flags) { struct superpageblock *sb; int current_order; struct page *page; int fallback_mt; + unsigned int search_cats; + + /* + * When ALLOC_NOFRAG_TAINTED_OK is set, only steal from tainted + * SPBs to avoid tainting clean ones. Otherwise search all categories. + */ + if (alloc_flags & ALLOC_NOFRAG_TAINTED_OK) + search_cats = SB_SEARCH_PREFERRED; + else + search_cats = SB_SEARCH_PREFERRED | SB_SEARCH_FALLBACK; /* * Search per-superpageblock free lists for fallback migratetypes. @@ -3581,7 +3637,7 @@ __rmqueue_steal(struct zone *zone, int order, int start_migratetype) page = __rmqueue_sb_find_fallback(zone, current_order, start_migratetype, &fallback_mt, - SB_SEARCH_PREFERRED | SB_SEARCH_FALLBACK); + search_cats); if (!page) continue; @@ -3681,8 +3737,10 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype, } fallthrough; case RMQUEUE_STEAL: - if (!(alloc_flags & ALLOC_NOFRAGMENT)) { - page = __rmqueue_steal(zone, order, migratetype); + if (!(alloc_flags & ALLOC_NOFRAGMENT) || + (alloc_flags & ALLOC_NOFRAG_TAINTED_OK)) { + page = __rmqueue_steal(zone, order, migratetype, + alloc_flags); if (page) { *mode = RMQUEUE_STEAL; return page; @@ -5301,9 +5359,24 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, /* * It's possible on a UMA machine to get through all zones that are * fragmented. If avoiding fragmentation, reset and try again. - */ - if (no_fallback && !defrag_mode) { - alloc_flags &= ~ALLOC_NOFRAGMENT; + * + * For allocations that can do direct reclaim, keep NOFRAGMENT set + * and let the slowpath try reclaim and compaction to free pages in + * already-tainted superpageblocks before allowing clean SPBs to be + * tainted. + * + * Atomic allocations cannot reclaim, but try an intermediate step + * first: allow steal/claim from tainted SPBs only. This avoids + * tainting clean SPBs while still finding pages in tainted ones. + * Only drop NOFRAGMENT entirely if that also fails. + */ + if (no_fallback && !defrag_mode && + !(gfp_mask & __GFP_DIRECT_RECLAIM)) { + if (!(alloc_flags & ALLOC_NOFRAG_TAINTED_OK)) { + alloc_flags |= ALLOC_NOFRAG_TAINTED_OK; + goto retry; + } + alloc_flags &= ~(ALLOC_NOFRAGMENT | ALLOC_NOFRAG_TAINTED_OK); goto retry; } -- 2.52.0