From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F3A1F3A380C for ; Thu, 30 Apr 2026 20:22:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=96.67.55.147 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777580571; cv=none; b=iZtKQyW6W9F30m5RVmMIHwHTPNXS0RLU204InL2P9ZknwOF4YO4bGWjjRvBrTtQajt8mvALkbR30X/u+9+yqiniTMPAHWu9Ovhd+UlD21BsZWVNn3L4aZO4X98tXhzOK0ZpZ8NU7tj0gEreCg3qMLirg0YbMphbKLAV0asv+Hvk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777580571; c=relaxed/simple; bh=NqiOtS+8asyvXzmp6l26CIxgYq3N5u7HezBcbyoXE9M=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=sEd7jsa5qQM3kPr8JpRAETvmU0NyC2obRrzG90ahHlqQJkyRWdD0Gf/8cntoKhLwbSqsRgJRDQlLOiTlK1aV+BGGdmsvLMUiD/DBXgbfqfOcpAqLN9lYjeb1mQLrNQD5TgUQdeeUTR0uGkDkVqrxr26OzFitUqiHCmC2kCV5Pbs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com; spf=pass smtp.mailfrom=surriel.com; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b=HSsjUtxs; arc=none smtp.client-ip=96.67.55.147 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=surriel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b="HSsjUtxs" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=surriel.com ; s=mail; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=Gpq7iVC3YJDB67xhJzrLFiyNyhJnXmyyuQJ4fGwG900=; b=HSsjUtxsRrHo6JtuFsWOtMlTi5 ax6rGVTV4GSwJ7xiBJ62HO6s4Sqxo9ho85hazEwAbGNi4Sw3n1kpMTrFM21BPaXPF1Mhqt7t4CqBb HnG86h/PdVgWT08fcCT/iDe4M8se70le8uzv6FBEkGv/CdWg7RioIgQ36I2jsVEGxRTxF0axRiTQF mycYEX+a2UxmA9c1t6d6ev6ksbz6MN2ikNoaHToEY3eVpbAce8PI9oZV/4l+BXV8VlD1Tl5PD4A8F hx7L8QsQB//ZOTbL1cH+gZ754LIEm9lKkPsqyiIKzm/PlhrJML91EJ3tOgbxLF0fbeCykLMEmh223 BZOQZMnQ==; Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1wIXuC-000000001R0-3pCf; Thu, 30 Apr 2026 16:22:40 -0400 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: kernel-team@meta.com, linux-mm@kvack.org, david@kernel.org, willy@infradead.org, surenb@google.com, hannes@cmpxchg.org, ljs@kernel.org, ziy@nvidia.com, usama.arif@linux.dev, Rik van Riel , Rik van Riel Subject: [RFC PATCH 20/45] mm: page_alloc: aggressively pack non-movable allocations in tainted SPBs on large systems Date: Thu, 30 Apr 2026 16:20:49 -0400 Message-ID: <20260430202233.111010-21-riel@surriel.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260430202233.111010-1-riel@surriel.com> References: <20260430202233.111010-1-riel@surriel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit From: Rik van Riel On systems with many superpageblocks, sub-pageblock MOVABLE fragments within already-tainted SPBs were being skipped by __rmqueue_claim() due to the ALLOC_NOFRAGMENT pageblock_order floor. This caused the allocator to fall through to clean SPBs, tainting them unnecessarily. Introduce SPB_AGGRESSIVE_THRESHOLD: on systems with more than 8 superpageblocks, relax the min_order floor for the preferred category (tainted SPBs) so non-movable allocations consume free space there at any granularity. On small systems, preserve the pageblock_order floor to protect MOVABLE capacity within tainted SPBs. Signed-off-by: Rik van Riel Assisted-by: Claude:claude-opus-4.7 syzkaller --- mm/page_alloc.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 68 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 13bc57592cd5..215b7d6b95d2 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2643,6 +2643,24 @@ static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags */ #define SPB_TAINTED_RESERVE 4 +/* + * On systems with many superpageblocks, we can afford to "write off" + * tainted superpageblocks by aggressively packing unmovable/reclaimable + * allocations into them — even sub-pageblock fragments — to keep clean + * superpageblocks clean for future 1GB hugepage and contiguous allocations. + * + * On small systems (few superpageblocks), each SPB represents a large + * fraction of total memory. Aggressively claiming sub-pageblock movable + * fragments from tainted SPBs would destroy MOVABLE capacity that the + * system can't afford to lose, with little benefit since there are too + * few SPBs to meaningfully separate movable from unmovable anyway. + * + * This threshold controls the crossover: above it, prefer concentrating + * non-movable allocations in tainted SPBs at any granularity; below it, + * only claim whole free pageblocks from tainted SPBs. + */ +#define SPB_AGGRESSIVE_THRESHOLD 8 + /** * sb_preferred_for_movable - Find the fullest clean superpageblock for movable * @zone: zone to search @@ -3555,6 +3573,7 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype, { int current_order; int min_order = order; + int nofrag_min_order = order; struct page *page; int fallback_mt; static const unsigned int cat_search[] = { @@ -3568,9 +3587,18 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype, * Do not steal pages from freelists belonging to other pageblocks * i.e. orders < pageblock_order. If there are no local zones free, * the zonelists will be reiterated without ALLOC_NOFRAGMENT. + * + * Only apply this restriction to empty and clean superpageblocks. + * Claiming within already-tainted superpageblocks does not cause + * new fragmentation, and skipping them wastes free space that + * could prevent tainting clean superpageblocks. + * + * When ALLOC_NOFRAGMENT is set, skip empty and clean superpageblocks + * entirely to avoid tainting them. The slowpath will try reclaim and + * compaction first, and only drop ALLOC_NOFRAGMENT as a last resort. */ if (order < pageblock_order && alloc_flags & ALLOC_NOFRAGMENT) - min_order = pageblock_order; + nofrag_min_order = pageblock_order; /* * Find the largest available free page in a fallback migratetype. @@ -3580,6 +3608,31 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype, * ones. */ for (c = 0; c < ARRAY_SIZE(cat_search); c++) { + /* + * When avoiding fragmentation, do not search clean/empty + * superpageblocks for fallback pages. Tainting a clean SPB + * is the worst outcome — better to fail and let the slowpath + * try reclaim and compaction in already-tainted SPBs first. + */ + if ((alloc_flags & ALLOC_NOFRAGMENT) && + cat_search[c] != SB_SEARCH_PREFERRED) + continue; + + /* + * For the preferred category (tainted SPBs for non-movable), + * search all orders down to the allocation order on systems + * with enough superpageblocks that we can afford to write off + * tainted ones. These SPBs are already tainted, so sub-pageblock + * stealing doesn't cause additional fragmentation. + * + * On small systems, keep the pageblock_order floor to preserve + * MOVABLE capacity within tainted SPBs — see comment at + * SPB_AGGRESSIVE_THRESHOLD. + */ + min_order = (cat_search[c] == SB_SEARCH_PREFERRED && + zone->nr_superpageblocks > SPB_AGGRESSIVE_THRESHOLD) ? + order : nofrag_min_order; + for (current_order = MAX_PAGE_ORDER; current_order >= min_order; --current_order) { if (!should_try_claim_block(current_order, @@ -3850,8 +3903,18 @@ static bool rmqueue_bulk(struct zone *zone, unsigned int order, * For movable allocations, prefer pageblocks from the * fullest clean superpageblock to pack allocations and * preserve empty superpageblocks for 1GB hugepages. + * + * For non-movable allocations, force ALLOC_NOFRAGMENT so + * __rmqueue cannot steal a whole pageblock out of a clean + * SPB. Stealing is the worst possible outcome for a bulk + * refill: a single network or slab burst can taint dozens + * of clean pageblocks. Phase 2 will adopt sub-pageblock + * fragments from tainted SPBs before Phase 3 falls back to + * the original alloc_flags (which may eventually steal at + * the requested order, a much smaller fragmentation event). */ while (refilled + pageblock_nr_pages <= pages_needed) { + unsigned int p1_alloc_flags = alloc_flags; struct page *page = NULL; if (migratetype == MIGRATE_MOVABLE) { @@ -3861,11 +3924,14 @@ static bool rmqueue_bulk(struct zone *zone, unsigned int order, if (sb) page = __rmqueue_from_sb(zone, pageblock_order, migratetype, sb); + } else if (!is_migrate_cma(migratetype)) { + p1_alloc_flags = (p1_alloc_flags | ALLOC_NOFRAGMENT) & + ~ALLOC_NOFRAG_TAINTED_OK; } if (!page) page = __rmqueue(zone, pageblock_order, migratetype, - alloc_flags, &rmqm); + p1_alloc_flags, &rmqm); if (!page) break; -- 2.52.0