From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0A5333A75B0 for ; Thu, 30 Apr 2026 20:22:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=96.67.55.147 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777580582; cv=none; b=lFrqDTCezGTODwoiCgPG5HSXw1gcgVejqxZdDAJX4IhRpb+8hqHtpl1FgIGALOmgIKfz2Ut0kDvmbnaFyUdsKKs2L40HLWOcOa69SzecjYOy1jP6QnJdPTAwwsmz+dYYLLjTtKcPeIIKqXxGtyDCgDOsMBOe2mtYRJdAjdIb7xo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777580582; c=relaxed/simple; bh=VSbYow1QgR85/j2TKbqci6DdLF/x9zKbws/w0x60nnk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Te7eZn6FsvHqJFK0+scVvF8fk4rem8Z8zpCE3c0OiYo7/24QLwQq5e3UazdeUFQnw/Y/L82jEcuhgFtJ7Q0B5+SAt0rzhr6zLsryoQYn6EgsPGB2Fj/dEi53cYsXVKB6STf00vPAazDeZVCqP9TP1upVwkYTEIA14a1Xq+z1sK8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com; spf=pass smtp.mailfrom=surriel.com; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b=UgGUw6Bj; arc=none smtp.client-ip=96.67.55.147 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=surriel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b="UgGUw6Bj" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=surriel.com ; s=mail; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=5SkLLRBBEJdim0uzr+W128xpCHYKOeU1xiZdF4Xr45I=; b=UgGUw6BjMFcx0xglkBfz1I+V91 /XtQZWF1wBAO9UXo7pCAnme2k9kQp+l2kkzSytN5csrmWTGLbSoXQeVSpEYMzBkf81FjsQ6S90xKg Jutzez280+NHQ0UZExgWYLn37N7K3noPCg2jDoij8BAfgysMHL3WjkOvtkYAcaNSTPb+2KsL4baW4 w9SSaHIQyVXyIAddrmymVD8Nf4EOYALLfAFF0LxkPSC78sdmAUrm8SwN4fdza5NRXdKR8991Pflcb EJOAC+KtUinh/CLlgNVszwna8oiG05ZfY0pLyB9jwDkHAxN47eulcUaCwsgjDMjfsqTtMPm6rh4us JeYC0djA==; Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1wIXuD-000000001R0-279a; Thu, 30 Apr 2026 16:22:41 -0400 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: kernel-team@meta.com, linux-mm@kvack.org, david@kernel.org, willy@infradead.org, surenb@google.com, hannes@cmpxchg.org, ljs@kernel.org, ziy@nvidia.com, usama.arif@linux.dev, Rik van Riel , Rik van Riel Subject: [RFC PATCH 42/45] mm: page_alloc: cross-MOV borrow within tainted SPBs Date: Thu, 30 Apr 2026 16:21:11 -0400 Message-ID: <20260430202233.111010-43-riel@surriel.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260430202233.111010-1-riel@surriel.com> References: <20260430202233.111010-1-riel@surriel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit From: Rik van Riel Pass 2c (cross-non-movable borrow) is restricted to UNMOV<->RECL: it borrows individual buddies from the opposite non-movable migratetype's free list within a tainted SPB without relabeling the source pageblock. Movable free pages within tainted SPBs are deliberately excluded because long-lived non-movable content in a MOV-tagged pageblock blocks compaction of that pageblock. Under workloads that mostly free MOVABLE-tagged content into tainted SPBs (page-cache reclaim, anon LRU shrink), the result is a tainted SPB with tens to hundreds of thousands of free pages all on the MOV free list — invisible to non-movable demand. Pass 1 doesn't see them (they're not on the requesting mt's list), Pass 2/2b can't claim a whole pageblock when sb->nr_free == 0 (no contiguous free PB to relabel), and Pass 2c skips MOV. The non-movable alloc falls through to Pass 3 and taints a fresh clean SPB even though the existing tainted ones have plenty of unused space. Add Pass 2d, mirroring Pass 2c semantics but borrowing from the MOVABLE free list within already-tainted SPBs. The borrowed page is used for the requesting non-movable mt for the lifetime of the allocation, then on free returns to the MOVABLE list (no pageblock relabel; same "borrow" mechanism as 2c). Tradeoff: the borrowed UNMOV/RECL content blocks compaction of its source pageblock until the alloc is freed. Restricted to SB_TAINTED so contamination is bounded to one pageblock inside an already- tainted SPB. The alternative — Pass 3 tainting a fresh clean SPB — removes a 1 GiB region from the clean pool, which is strictly worse for the anti-fragmentation invariant the series is built around. Skipped for movable allocs (they use Pass 4) and CMA allocs. Observable as the new SPB_ALLOC_OUTCOME_PASS_2D outcome on the spb_alloc_walk tracepoint. Expected effect on the live workload: tainted SPB count growth slows substantially; allocations that were previously taking the PASS_3 escape now succeed in PASS_2D. Signed-off-by: Rik van Riel Assisted-by: Claude:claude-opus-4.7 syzkaller --- mm/page_alloc.c | 73 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 73 insertions(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2f5d3ba1c0ef..af499f0a1a48 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3280,6 +3280,79 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order, } } } + + /* + * Pass 2d: cross-MOV borrow within tainted SPBs. + * + * If Pass 1/2/2b/2c all failed, the next step is Pass 3 + * which would taint a fresh clean SPB. Before that, try + * to borrow an individual buddy from a tainted SPB's + * MIGRATE_MOVABLE free list. + * + * Tainted SPBs accumulate large amounts of free space on + * the MOV free list (e.g. reclaimed page-cache pages + * whose pageblock tag is MOVABLE). Pass 1 cannot see + * those for non-movable allocs, Pass 2/2b cannot claim a + * whole pageblock when sb->nr_free == 0, and Pass 2c is + * restricted to UNMOV<->RECL. The result is a tainted + * SPB with tens to hundreds of thousands of free pages + * all unreachable from non-movable demand. + * + * Borrow semantics mirror Pass 2c: take a buddy from the + * MOVABLE free list without relabeling the source + * pageblock. The page is used for the requesting non- + * movable mt for the lifetime of the allocation, then on + * free returns to the MOVABLE list. + * + * Cost: the borrowed UNMOV/RECL content blocks + * compaction of its source pageblock until freed. + * Restricted to SB_TAINTED so the contamination is + * bounded to an already-tainted SPB; the alternative + * (Pass 3) taints a fresh clean SPB and removes a 1 GiB + * region from the clean pool, which is strictly worse. + * + * Skipped for movable allocs (they have Pass 4) and for + * CMA allocs. + */ + if (!movable && !is_migrate_cma(migratetype)) { + for (full = SB_FULL; full < __NR_SB_FULLNESS; full++) { + list_for_each_entry(sb, + &zone->spb_lists[SB_TAINTED][full], list) { + int co; + + if (!sb->nr_free_pages) + continue; + for (co = min_t(int, pageblock_order - 1, + NR_PAGE_ORDERS - 1); + co >= (int)order; + --co) { + current_order = co; + area = &sb->free_area[current_order]; + page = get_page_from_free_area( + area, MIGRATE_MOVABLE); + if (!page) + continue; + if (get_pageblock_isolate(page)) + continue; + if (is_migrate_cma( + get_pageblock_migratetype(page))) + continue; + page_del_and_expand(zone, page, + order, current_order, + MIGRATE_MOVABLE); + __spb_set_has_type(page, + migratetype); + if (spb_below_shrink_high_water(sb)) + queue_spb_slab_shrink(zone); + trace_mm_page_alloc_zone_locked( + page, order, migratetype, + pcp_allowed_order(order) && + migratetype < MIGRATE_PCPTYPES); + return page; + } + } + } + } } /* -- 2.52.0