From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 974063A4F58 for ; Thu, 30 Apr 2026 20:22:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=96.67.55.147 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777580571; cv=none; b=HQxLl6Iok+5TFea8R3AzcVLJ8SyXJseGDlreUc95X881OcO0QkFM1oAZ9xrJVPy7tyto7e2YVr4/91/H8sd9tawRYLDIrsVshyWbb2lsPhBkI0q2FukZoAgWcygN0mnFIJbuRqzRZVZCML8QVyyCvj7zqBCgSeMLeIvst3vF1T8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777580571; c=relaxed/simple; bh=MJUsPtFuyZXGUyZGF+aZTIfwzkbD83Mw+NBp1gqc0VU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=QKdHfW2FWVN8W9AmRyT6QEfNkVNI3BJmmptS1cqImZWLPo8yEGcCsJrNHOqC2Y0k1TOkUrBdi1QAExGws5Bcj0pOP/mblD3NvbkxdNkMWYTgpCy9IuxGjETMQhZjzqSlnIndxmt0cz//e+e6DV4bHIacZHplXIWFd8aCHUxuSKg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com; spf=pass smtp.mailfrom=surriel.com; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b=R3PSJKNl; arc=none smtp.client-ip=96.67.55.147 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=surriel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b="R3PSJKNl" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=surriel.com ; s=mail; h=Content-Transfer-Encoding:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=Z9ltfmiZTp0GPGYZbxlh1Or8A5M4ofkZ6SxQMQ/0K9k=; b=R3PSJKNli7HkMPRnoYcGlt+gMm jx1Yn2STfnT3xsRkmvPkgMlEm3JH1YMhtwGURaQaxbuFlFS9MhVPx8Tn3xhegDXOXxCEVnNZraZck 8yZs6d4f3Dfre6qfegXDOHoYS0ZXBN+dsG3TH3EuZjTp60Lq6dTKhMD0uGn9AfBqhLh9ctOGnKpg+ k/Ay2ssz8MBmAH8YWnEgjvOmxezjTFEXbWkGAb2w/1VwGrHATNTov+C5f0wunZ+EvQZrkRMGYe1Vi 5CW1017HfJiBrNwxe7aeWIKd0AY9NN1hkWHE2P9/gm+4Ms+dkdOvDK074Bt1rx1Vjm5t1G+X84aof MQWSbCaw==; Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1wIXuD-000000001R0-05G3; Thu, 30 Apr 2026 16:22:41 -0400 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: kernel-team@meta.com, linux-mm@kvack.org, david@kernel.org, willy@infradead.org, surenb@google.com, hannes@cmpxchg.org, ljs@kernel.org, ziy@nvidia.com, usama.arif@linux.dev, Rik van Riel , Rik van Riel Subject: [RFC PATCH 25/45] mm: page_alloc: skip pageblock compatibility threshold in tainted SPBs Date: Thu, 30 Apr 2026 16:20:54 -0400 Message-ID: <20260430202233.111010-26-riel@surriel.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260430202233.111010-1-riel@surriel.com> References: <20260430202233.111010-1-riel@surriel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: Rik van Riel Summary: __rmqueue_smallest Pass 2b is the last resort before tainting a fresh clean superpageblock: it walks MOVABLE sub-pageblock free chunks inside already-tainted SPBs, calling try_to_claim_block() to relabel a movable pageblock as the requested non-movable type. If Pass 2b fails, the allocator falls through to Pass 3 and taints a clean SPB. try_to_claim_block() guards the relabel with a 50% compatibility check: free_pages + alike_pages must be at least pageblock_nr_pages/2. The guard exists to protect a generic clean MOVABLE pageblock from being relabeled when most of its pages are still in-use movable allocations. Inside a tainted SPB the guard is harmful, not protective. The SPB has already accepted fragmentation, and stranding a few in-use movable pages inside a relabeled pageblock is dramatically cheaper than tainting an entire clean SPB. bpftrace on a devvm under realistic load caught the pathology directly: at the moment a clean SPB was tainted, all 8 existing tainted SPBs had nr_free=0 (no whole free pageblocks), collectively held ~21k movable free pages distributed across MOVABLE pageblocks, and try_to_claim_block() had failed 29182 of 29228 calls (99.84%) over the prior few minutes. Pass 2b was effectively unable to absorb non-movable demand into the tainted pool. Add a from_tainted_spb parameter to try_to_claim_block() and skip the 50% threshold when set. Pass 2b passes true (it walks SB_TAINTED lists exclusively); __rmqueue_claim() passes false to preserve its existing fragmentation-protection semantics. Test Plan: Devvm bpftrace setup at ~/spb-monitors/spb-taint-walk.bt watches clean->tainted transitions in zone Normal and tracks try_to_claim_block call/ok/fail counters. Before the change the fail rate was 99.84% with periodic clean SPB taints under load. After the change, expect the fail rate to drop sharply and the count of tainted SPBs to plateau at the boot-recruited set. Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Rik van Riel Assisted-by: Claude:claude-opus-4.7 syzkaller --- mm/page_alloc.c | 26 ++++++++++++++++++++------ 1 file changed, 20 insertions(+), 6 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 493db531b869..67cc8165ab1f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2776,7 +2776,8 @@ static struct page *claim_whole_block(struct zone *zone, struct page *page, int current_order, int order, int new_type, int old_type); static struct page *try_to_claim_block(struct zone *zone, struct page *page, int current_order, int order, int start_type, - int block_type, unsigned int alloc_flags); + int block_type, unsigned int alloc_flags, + bool from_tainted_spb); static __always_inline struct page *__rmqueue_smallest(struct zone *zone, unsigned int order, @@ -2941,7 +2942,7 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order, page = try_to_claim_block(zone, page, current_order, order, migratetype, MIGRATE_MOVABLE, - 0); + 0, true); if (!page) continue; trace_mm_page_alloc_zone_locked( @@ -3420,11 +3421,17 @@ claim_whole_block(struct zone *zone, struct page *page, * not, we check the pageblock for constituent pages; if at least half of the * pages are free or compatible, we can still claim the whole block, so pages * freed in the future will be put on the correct free list. + * + * @from_tainted_spb: caller has already verified the block lives in a tainted + * superpageblock, where SPB-level fragmentation has already been accepted. + * Skip the per-pageblock compatibility threshold so we can absorb non-movable + * demand into the existing tainted SPB instead of tainting a fresh clean one. */ static struct page * try_to_claim_block(struct zone *zone, struct page *page, int current_order, int order, int start_type, - int block_type, unsigned int alloc_flags) + int block_type, unsigned int alloc_flags, + bool from_tainted_spb) { int free_pages, movable_pages, alike_pages; unsigned long start_pfn; @@ -3480,8 +3487,14 @@ try_to_claim_block(struct zone *zone, struct page *page, /* * If a sufficient number of pages in the block are either free or of * compatible migratability as our allocation, claim the whole block. - */ - if (free_pages + alike_pages >= (1 << (pageblock_order-1)) || + * The compatibility threshold protects clean MOVABLE pageblocks from + * being relabeled when most of their pages are still in-use movable + * allocations. Inside a tainted SPB the protection is unnecessary: + * fragmentation has already been accepted at the SPB level, and + * relabeling is much cheaper than tainting a fresh clean SPB. + */ + if (from_tainted_spb || + free_pages + alike_pages >= (1 << (pageblock_order-1)) || page_group_by_mobility_disabled) { __move_freepages_block(zone, start_pfn, block_type, start_type); set_pageblock_migratetype(pfn_to_page(start_pfn), start_type); @@ -3721,7 +3734,8 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype, page = try_to_claim_block(zone, page, current_order, order, start_migratetype, - fallback_mt, alloc_flags); + fallback_mt, alloc_flags, + false); if (page) { trace_mm_page_alloc_extfrag(page, order, current_order, start_migratetype, -- 2.52.0