From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0FEE33A7F41
	for <linux-kernel@vger.kernel.org>; Thu, 30 Apr 2026 20:22:55 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=96.67.55.147
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1777580583; cv=none; b=ToTEufDknsWoJu9LAJcoN+el/rv/6dMyHNURhAzl750Op0o8iAVRkccqeg4TuC7Miy2YSYqX0ZbiUDuAFBn3DGliyN3+mVDXF1OsiwhcAaHwuZnIjMQH4XP+/GReSBgGafoU0YQGRZTliCKAGf7fPBUQtPoylYDpHqD2HJoSVkw=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1777580583; c=relaxed/simple;
	bh=XPSCdbopVYGIDCnUywiIf5ScAAxvzpSBf8k0dBI3auM=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version:Content-Type; b=mX6fvEFi2UApQpOAYOF5MzVy/1WCBenOahd9YcTdmC3weyHozkwUalIODyOhRtECI4arrFHa4sFVfwfDv7S309lAqlZqBPIEWRBAYWa6jDF21FqP4ndNccc8SbXTTd88SN7F6lxf68+fd3aPqQ5n+1cGELgHz+p4QcETxRwqhPw=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com; spf=pass smtp.mailfrom=surriel.com; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b=LXvo21Oy; arc=none smtp.client-ip=96.67.55.147
Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=surriel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b="LXvo21Oy"
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=surriel.com
	; s=mail; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References:
	In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID:
	Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc
	:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe:
	List-Post:List-Owner:List-Archive;
	bh=9uzzm6jNqfcLsCYz5+I8rM6GjpDZDUZNvjjUeRJJYRk=; b=LXvo21Oyrc69gpY40R7UwUItR8
	XiiuPGCCl8FZN7nMBw/yvpDmo7U2LQM8s40ZF6WPCS67Y8DiqG93nwDxQKvug1Ie9FdfkxFZJNm0W
	XPtb9vAUNG2HNHP6iUjQk88bCmN+peSyHcjHTu3B/vgHiRn/truUxw4O/sJGd3hLCkdWX9axKFEIx
	tBvV9wDdJpPi9kvuOSyuyDKqIfCO6YiLOxQqvXvO+7QbqjSa/4582YQ3G5L5cZwEzyM8SwTBDVqZF
	nUS0qqTsn/LxUR8YRr8WvW+z63rI2KhtLVdmBiIXE3ow6Z+rsGVYpV+aB0cqV8riifzZuTIc3U6yk
	/ax1ci4Q==;
Received: from fangorn.home.surriel.com ([10.0.13.7])
	by shelob.surriel.com with esmtpsa  (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
	(Exim 4.97.1)
	(envelope-from <riel@surriel.com>)
	id 1wIXuD-000000001R0-0mie;
	Thu, 30 Apr 2026 16:22:41 -0400
From: Rik van Riel <riel@surriel.com>
To: linux-kernel@vger.kernel.org
Cc: kernel-team@meta.com,
	linux-mm@kvack.org,
	david@kernel.org,
	willy@infradead.org,
	surenb@google.com,
	hannes@cmpxchg.org,
	ljs@kernel.org,
	ziy@nvidia.com,
	usama.arif@linux.dev,
	Rik van Riel <riel@meta.com>,
	Rik van Riel <riel@surriel.com>
Subject: [RFC PATCH 28/45] mm: page_alloc: keep PCP refill in tainted SPBs across owned pageblocks
Date: Thu, 30 Apr 2026 16:20:57 -0400
Message-ID: <20260430202233.111010-29-riel@surriel.com>
X-Mailer: git-send-email 2.52.0
In-Reply-To: <20260430202233.111010-1-riel@surriel.com>
References: <20260430202233.111010-1-riel@surriel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

From: Rik van Riel <riel@meta.com>

rmqueue_bulk Phase 2 walks SB_TAINTED superpageblocks looking for
sub-pageblock free fragments, so PCP refill can be satisfied without
tainting a clean SPB. The original Phase 2 abandons a candidate
pageblock entirely if pbd->cpu != 0 (already owned by some CPU), to
avoid two CPUs holding PCPBuddy pages from the same pageblock — which
would let the PCP merge pass corrupt the other CPU's PCP list.

On systems with many CPUs (88+) and many tainted SPBs (~50% on a 16
GiB devvm under stress), nearly every free fragment in a tainted SPB
lives in a pageblock already PCPBuddy-owned by some CPU. Phase 2 skips
through the entire SPB without finding anything usable, the atomic
alloc falls through to the slowpath, and clean SPBs get tainted.

Take the page anyway when the source pageblock is owned, but skip the
ownership claim and PCPBuddy marking. Phase 3 / __rmqueue_smallest
already pull plain non-PCPBuddy pages from owned pageblocks the same
way; the hazard is specifically about two CPUs holding PCPBuddy pages
from the same pageblock, not about a plain non-PCPBuddy page coexisting
with another CPU's PCPBuddy entries. Pass 0 (owned-block recovery) is
only meaningful when we actually claimed ownership, so register on
owned_blocks only when !pb_owned.

Fixes: 266461cd5442 ("mm: page_alloc: adopt partial pageblocks from tainted superpageblocks")

Signed-off-by: Rik van Riel <riel@surriel.com>
Assisted-by: Claude:claude-opus-4.7 syzkaller
---
 mm/page_alloc.c | 50 ++++++++++++++++++++++++++++---------------------
 1 file changed, 29 insertions(+), 21 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f0fdfe8c9a45..a09660a06ed3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4133,6 +4133,7 @@ static bool rmqueue_bulk(struct zone *zone, unsigned int order,
 				&zone->spb_lists[SB_TAINTED][full], list) {
 				struct page *page;
 				int found_order = -1;
+				bool claim_pb;
 
 				if (sb->nr_free_pages < pageblock_nr_pages / 4)
 					continue;
@@ -4156,33 +4157,39 @@ static bool rmqueue_bulk(struct zone *zone, unsigned int order,
 					continue;
 
 				/*
-				 * Check that this pageblock isn't already
-				 * owned by another CPU. If it is, two CPUs
-				 * would have PCPBuddy pages from the same
-				 * pageblock, and the PCP merge pass could
-				 * corrupt the other CPU's PCP list.
+				 * Found a free fragment in a tainted SPB. Take
+				 * it from the buddy.
+				 *
+				 * If the source pageblock is unowned, claim it:
+				 * mark our pages PagePCPBuddy and register the
+				 * block on owned_blocks so Pass 0 can recover
+				 * remaining fragments on future refills.
+				 *
+				 * If the source pageblock is already owned by
+				 * some CPU (us or another), take the page as a
+				 * plain non-PCPBuddy fragment — the same way
+				 * Phase 3 / __rmqueue_smallest would. Setting
+				 * PagePCPBuddy here would let two CPUs hold
+				 * PCPBuddy pages from the same pageblock, and
+				 * the PCP merge pass could then corrupt the
+				 * other CPU's PCP list.
+				 *
+				 * Set PB_has_<migratetype> either way (bypasses
+				 * page_del_and_expand which normally does the
+				 * PB_has tracking); idempotent if already set.
 				 */
 				pbd = pfn_to_pageblock(page,
 						       page_to_pfn(page));
-				if (pbd->cpu != 0)
-					continue;
+				claim_pb = (pbd->cpu == 0);
 
-				/*
-				 * Found a free chunk in an unowned pageblock.
-				 * Take it from buddy, claim ownership, and
-				 * set PCPBuddy. Pass 0 will grab remaining
-				 * buddy entries on future refills.
-				 *
-				 * Set PB_has_<migratetype> since we bypass
-				 * page_del_and_expand (which normally does
-				 * PB_has tracking).
-				 */
 				del_page_from_free_list(page, zone,
 							found_order,
 							migratetype);
 				__spb_set_has_type(page, migratetype);
-				set_pcpblock_owner(page, cpu);
-				__SetPagePCPBuddy(page);
+				if (claim_pb) {
+					set_pcpblock_owner(page, cpu);
+					__SetPagePCPBuddy(page);
+				}
 				pcp_enqueue_tail(pcp, page, migratetype,
 						 found_order);
 				refilled += 1 << found_order;
@@ -4190,9 +4197,10 @@ static bool rmqueue_bulk(struct zone *zone, unsigned int order,
 				/*
 				 * Register for Phase 0 recovery so future
 				 * drains from this pageblock can be swept
-				 * back efficiently.
+				 * back efficiently. Only meaningful when we
+				 * actually claimed ownership above.
 				 */
-				if (list_empty(&pbd->cpu_node))
+				if (claim_pb && list_empty(&pbd->cpu_node))
 					list_add(&pbd->cpu_node,
 						 &pcp->owned_blocks);
 
-- 
2.52.0