From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2D3373A7F5E for ; Thu, 30 Apr 2026 20:22:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=96.67.55.147 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777580584; cv=none; b=fykwu5RPnGBoLYb9+wKIRx9GYL1jHkGqv4WcYVS+UhS4T9QlpKHC+H+2g9XN43NxtlKlbOGzyo9WTDY65uzFvJFJQT2ff4c9WdN1t6wsRiKjDQ/eVRomBQEckxhXNgK5zaFh65GGLp+CYcnauU4JG3HWnSur0bN7TelmiXvYKD4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777580584; c=relaxed/simple; bh=1gflizIUmHA09Oj/XAh99QBkqLxidc10MEiY45WMm8E=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=p89wOr94/zctkG+O6Bfgk9baJHBj/sjISDP3binByYtlRufEypXZDmtk9hQoNa5XD9FMaoWOQ43y3Uk1Bt3vj/+kPUTjmaKaUeooljZjqbdmzCkRfsIcoJ3kU2g/73xih66KVsUt/qvZuenqgxzQLJGioczmUc7I9IAOGAMRDLg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com; spf=pass smtp.mailfrom=surriel.com; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b=ZxvDHkRo; arc=none smtp.client-ip=96.67.55.147 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=surriel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b="ZxvDHkRo" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=surriel.com ; s=mail; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=hCIk5l/c6+i4Ogv4xsGctOwFttOXbLS3VhUaLeGIj3o=; b=ZxvDHkRokdDVV+bCpRB2U1TK+9 1z7kJM8BDrQJRdpNKfQVx1xqKoI2telpwpjmgcVMu/gvcgNdrSfcsnjR9HvR+DPfhOKR0VFsdg3iF qWvrB3ZYiTLWhDlreapHyMJh3OtD3UtJXEJrwkQDgJuy86kklnOn9aD3huZNETIrVOfYAzQTYxTBV HfofCVbqpKpyS3cacQREpqbFTQ/fjyngsARkrw5zeIx+xxO9c3KirgGn3mWPVCEnw63+gAmVEU3ZO lUIrbXVzYptR68vrd+0K5w3AyVKm9zANFPdAGmIWZySVjRvcdiZUSlw5v73DsVdw9VvPgraUspgqx wSYpGqmg==; Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1wIXuC-000000001R0-2a87; Thu, 30 Apr 2026 16:22:40 -0400 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: kernel-team@meta.com, linux-mm@kvack.org, david@kernel.org, willy@infradead.org, surenb@google.com, hannes@cmpxchg.org, ljs@kernel.org, ziy@nvidia.com, usama.arif@linux.dev, Rik van Riel , Rik van Riel Subject: [RFC PATCH 08/45] mm: page_alloc: track actual page contents in pageblock flags Date: Thu, 30 Apr 2026 16:20:37 -0400 Message-ID: <20260430202233.111010-9-riel@surriel.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260430202233.111010-1-riel@surriel.com> References: <20260430202233.111010-1-riel@surriel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit From: Rik van Riel Extend pageblock_data flags with PB_has_unmovable, PB_has_reclaimable, and PB_has_movable bits to track the actual types of pages allocated within a pageblock, independent of its intended migratetype. The flags are set at steal time in try_to_claim_block(), avoiding overhead on every allocation in __rmqueue_smallest(): 1. Allocation / steal time: when try_to_claim_block() claims a pageblock, set the PB_has_* flag corresponding to the allocation's migratetype. If unmovable or reclaimable pages are being placed into a pageblock that already has PB_has_movable set, queue async evacuation of the remaining movable pages. 2. Full pageblock free: when buddy merging reconstructs a complete pageblock in __free_one_page(), clear all PB_has_* flags since the block is now empty. 3. Migration scan: when isolate_migratepages_block() completes a full pageblock scan and finds no movable pages to isolate, clear PB_has_movable. This consolidates the clearing for all callers: evacuate_pageblock(), compaction, and alloc_contig_range(). This provides the foundation for superpageblock-level steering decisions: knowing which pageblocks actually contain unmovable/reclaimable pages allows directing future allocations to already-tainted regions, keeping clean regions available for large contiguous allocations. Signed-off-by: Rik van Riel Assisted-by: Claude:claude-opus-4.7 syzkaller --- include/linux/pageblock-flags.h | 9 ++++ mm/compaction.c | 17 ++++++ mm/page_alloc.c | 93 +++++++++++++++++++++++++-------- 3 files changed, 98 insertions(+), 21 deletions(-) diff --git a/include/linux/pageblock-flags.h b/include/linux/pageblock-flags.h index e046278a01fa..21bfcdf80b2e 100644 --- a/include/linux/pageblock-flags.h +++ b/include/linux/pageblock-flags.h @@ -20,6 +20,15 @@ enum pageblock_bits { PB_migrate_2, PB_compact_skip,/* If set the block is skipped by compaction */ + /* + * Track actual page contents independent of the intended migratetype. + * Set at allocation time; cleared on full pageblock free or when + * migration confirms no pages of that type remain. + */ + PB_has_unmovable, + PB_has_reclaimable, + PB_has_movable, + #ifdef CONFIG_MEMORY_ISOLATION /* * Pageblock isolation is represented with a separate bit, so that diff --git a/mm/compaction.c b/mm/compaction.c index 1e8f8eca318c..cf2a5074c473 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -849,6 +849,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, bool skip_on_failure = false; unsigned long next_skip_pfn = 0; bool skip_updated = false; + bool movable_skipped = false; int ret = 0; cc->migrate_pfn = low_pfn; @@ -1061,6 +1062,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, folio = page_folio(page); goto isolate_success; } + movable_skipped = true; } goto isolate_fail; @@ -1229,6 +1231,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, unlock_page_lruvec_irqrestore(locked, flags); locked = NULL; } + movable_skipped = true; folio_put(folio); isolate_fail: @@ -1292,6 +1295,20 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, if (!cc->no_set_skip_hint && valid_page && !skip_updated) set_pageblock_skip(valid_page); update_cached_migrate(cc, low_pfn); + + /* + * Full pageblock scanned with no movable pages isolated. + * Only clear PB_has_movable if no movable pages were + * seen at all. If movable pages exist but could not be + * isolated (pinned, writeback, dirty, etc.), leave the + * flag set so a future migration attempt can try again. + */ + if (!nr_isolated && !movable_skipped && valid_page && + get_pfnblock_bit(valid_page, pageblock_start_pfn(start_pfn), + PB_has_movable)) + clear_pfnblock_bit(valid_page, + pageblock_start_pfn(start_pfn), + PB_has_movable); } trace_mm_compaction_isolate_migratepages(start_pfn, low_pfn, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 45c25c4fc7c0..d0a4de435842 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -972,6 +972,30 @@ static void change_pageblock_range(struct page *pageblock_page, } } +/* + * mark_pageblock_free - handle a pageblock becoming fully free + * @page: page at the start of the pageblock + * @pfn: page frame number + * + * Clear stale PCP ownership and actual-contents tracking flags when + * buddy merging reconstructs a full pageblock or a whole pageblock is + * freed directly. No PCP can still hold pages from this block (otherwise + * the buddy merge couldn't have completed), so the ownership entry would + * just cause misrouted frees. + */ +static void mark_pageblock_free(struct page *page, unsigned long pfn) +{ + clear_pcpblock_owner(page); + + /* + * The entire block is now free — clear actual-contents tracking + * flags since no allocated pages remain. + */ + clear_pfnblock_bit(page, pfn, PB_has_unmovable); + clear_pfnblock_bit(page, pfn, PB_has_reclaimable); + clear_pfnblock_bit(page, pfn, PB_has_movable); +} + /* * Freeing function for a buddy system allocator. * @@ -1017,19 +1041,14 @@ static inline void __free_one_page(struct page *page, account_freepages(zone, 1 << order, migratetype); /* - * For whole blocks, ownership returns to the zone. There are - * no more outstanding frees to route through that CPU's PCP, - * and we don't want to confuse any future users of the pages - * in this block. E.g. rmqueue_buddy(). - * - * Check here if a whole block came in directly: pre-merged in - * the PCP, or PCP contended and bypassed. - * - * There is another check in the loop below if a block merges - * up with pages already on the zone buddy. + * When freeing a whole pageblock, clear stale PCP ownership + * and actual-contents tracking flags up front. The in-loop + * check only fires when sub-pageblock pages merge *up to* + * pageblock_order, not when entering at pageblock_order + * directly. */ if (order == pageblock_order) - clear_pcpblock_owner(page); + mark_pageblock_free(page, pfn); while (order < MAX_PAGE_ORDER) { int buddy_mt = migratetype; @@ -1081,9 +1100,13 @@ static inline void __free_one_page(struct page *page, pfn = combined_pfn; order++; - /* Clear owner also when we merge up. See above */ + /* + * If merging has reconstructed a full pageblock, + * clear any stale PCP ownership and actual-contents + * tracking flags. + */ if (order == pageblock_order) - clear_pcpblock_owner(page); + mark_pageblock_free(page, pfn); } done_merging: @@ -2469,15 +2492,32 @@ try_to_claim_block(struct zone *zone, struct page *page, set_pageblock_migratetype(pfn_to_page(start_pfn), start_type); #ifdef CONFIG_COMPACTION /* - * A movable pageblock was just claimed for unmovable or - * reclaimable use. Queue async evacuation of the remaining - * movable pages so future unmovable/reclaimable allocations - * can stay concentrated in fewer pageblocks. + * Track actual page contents in pageblock flags. + * Mark the pageblock with the type being allocated, and + * if unmovable/reclaimable pages are being placed into a + * pageblock that already has movable pages, queue async + * evacuation of the movable pages. */ - if (block_type == MIGRATE_MOVABLE && - (start_type == MIGRATE_UNMOVABLE || - start_type == MIGRATE_RECLAIMABLE)) - queue_pageblock_evacuate(zone, start_pfn); + { + struct page *start_page = pfn_to_page(start_pfn); + + if (start_type == MIGRATE_UNMOVABLE) { + set_pfnblock_bit(start_page, start_pfn, + PB_has_unmovable); + if (get_pfnblock_bit(start_page, start_pfn, + PB_has_movable)) + queue_pageblock_evacuate(zone, start_pfn); + } else if (start_type == MIGRATE_RECLAIMABLE) { + set_pfnblock_bit(start_page, start_pfn, + PB_has_reclaimable); + if (get_pfnblock_bit(start_page, start_pfn, + PB_has_movable)) + queue_pageblock_evacuate(zone, start_pfn); + } else if (start_type == MIGRATE_MOVABLE) { + set_pfnblock_bit(start_page, start_pfn, + PB_has_movable); + } + } #endif return __rmqueue_smallest(zone, order, start_type); } @@ -7212,6 +7252,17 @@ static void evacuate_pageblock(struct zone *zone, unsigned long start_pfn) if (!list_empty(&cc.migratepages)) putback_movable_pages(&cc.migratepages); + + /* + * Re-scan to let isolate_migratepages_block clear PB_has_movable + * if no movable pages remain after evacuation. + */ + cc.migrate_pfn = start_pfn; + cc.nr_migratepages = 0; + INIT_LIST_HEAD(&cc.migratepages); + isolate_migratepages_range(&cc, start_pfn, end_pfn); + if (!list_empty(&cc.migratepages)) + putback_movable_pages(&cc.migratepages); } static void evacuate_work_fn(struct work_struct *work) -- 2.52.0