From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A60103A5421 for ; Thu, 30 Apr 2026 20:22:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=96.67.55.147 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777580571; cv=none; b=QflKeOMU/W8W2ZxGthuMFACGQ7fuII/eJHZqS2IxzMtRl9UmzqcDlBa42IS3eeoAgMw3oERyQQVz6dHQjlSSgmX7iB3l6ErF/ed/Z3Y7g60VW7DENsmCW4tSAlCVYR7q2EvmhRT0U/lhCbNuvQ5/Ztoenas+XGKSPCkn95gLvHU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777580571; c=relaxed/simple; bh=bXdEZVpv1cYwjGixCneyU7YSnEveTxBDCIOi6udmyZQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=oRpLclYjguNKfhLC/gfE5qPn66dcEnSEy1GHGBoLT7ORLvP7L52uU9RfK7XS6fyCYSAJvohbZF1Kx0HA2FR3x7JFdY2vkGHaEnNVTN37DHzoPBBDgI4mVKMaUZbRdov6DJt7k966ZAP/BsirGKSQPZ3A1ASCV4tO7sOBBIGzeD0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com; spf=pass smtp.mailfrom=surriel.com; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b=JjUY0tte; arc=none smtp.client-ip=96.67.55.147 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=surriel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b="JjUY0tte" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=surriel.com ; s=mail; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=f8DIKV9rBDBiwpuAd2RdEQZcq/dmsaCdW4sLNt7Chuw=; b=JjUY0tte4sX6THLExLCyxVcOHc 7cyuFNo8tjjbkgBzN1B+1b78+LrEEsFYsQLgU1YMMj9lgsgUPTz8eAojaQII+66pXVMzZT4SzxzMZ zcZXqQbfDqqE46KYJ+fwjYlztxEgJ3j7jytwXv/GE9gdgzD81ay0a1CbV2EhcNnPb/7PVypOOLXXU JmTxvRtX/yGXu/Ra+qUbUTirkd+ikGrHXD2jULVN8XBwlqbP7WDhPIwJYgIUOH9VrHF1C8cnmVjgW dX69c3unJY93ycTT2PxnIFiUIxTbDG5pSJHLy3Z+9eJNiO+ftwBKzKJqjTmSUXXIcbEskpunPJgIk wkMXHjWg==; Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1wIXuC-000000001R0-2gCi; Thu, 30 Apr 2026 16:22:40 -0400 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: kernel-team@meta.com, linux-mm@kvack.org, david@kernel.org, willy@infradead.org, surenb@google.com, hannes@cmpxchg.org, ljs@kernel.org, ziy@nvidia.com, usama.arif@linux.dev, Rik van Riel Subject: [RFC PATCH 09/45] mm: page_alloc: introduce superpageblock metadata for 1GB anti-fragmentation Date: Thu, 30 Apr 2026 16:20:38 -0400 Message-ID: <20260430202233.111010-10-riel@surriel.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260430202233.111010-1-riel@surriel.com> References: <20260430202233.111010-1-riel@surriel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Introduce a 1GB (PUD-sized) "superpageblock" data structure to track pageblock composition at a coarser granularity, enabling future steering of unmovable/reclaimable allocations into already-tainted superpageblocks and preserving clean superpageblocks for 1GB hugepage allocation. Each superpageblock groups SUPERBLOCK_NR_PAGEBLOCKS pageblocks (512 on x86_64 with 2MB pageblocks) and maintains: - Counts of pageblocks by migratetype (nr_free, nr_unmovable, nr_reclaimable, nr_movable, nr_reserved) - A list_head for future organization by fullness category - Identity (start_pfn, zone pointer) Superblock counters are maintained by hooking into init_pageblock_migratetype(). Memory holes and firmware-reserved regions are tracked as reserved pageblocks by initializing all slots as reserved during setup and decrementing as init_pageblock_migratetype() claims them. The superpageblock array is allocated per-zone during boot via memblock. At ~48 bytes per superpageblock (~12KB for a 256GB system), the overhead is negligible. This is pure bookkeeping with no allocation behavior change. Signed-off-by: Rik van Riel Assisted-by: Claude:claude-opus-4.7 syzkaller --- include/linux/mmzone.h | 57 ++++++++++++++++++++++++++ mm/mm_init.c | 90 ++++++++++++++++++++++++++++++++++++++++++ mm/page_alloc.c | 65 ++++++++++++++++++++++++++++++ 3 files changed, 212 insertions(+) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 2ab45d1133d9..a0e8ce4b7b79 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -877,6 +877,43 @@ enum zone_type { #define ASYNC_AND_SYNC 2 +/* + * Superpageblock: 1GB (PUD-sized) region for anti-fragmentation tracking. + * + * Groups pageblocks to steer unmovable/reclaimable allocations into + * already-tainted superpageblocks, preserving clean superpageblocks for 1GB + * hugepage allocation. + * + * SUPERPAGEBLOCK_ORDER derived from PUD geometry: + * x86_64: PUD_SHIFT=30, PAGE_SHIFT=12 → order 18 → 1GB + * Each superpageblock contains SUPERPAGEBLOCK_NR_PAGEBLOCKS pageblocks + * (512 on x86_64 with 2MB pageblocks). + */ +#define SUPERPAGEBLOCK_ORDER (PUD_SHIFT - PAGE_SHIFT) +#define SUPERPAGEBLOCK_NR_PAGES (1UL << SUPERPAGEBLOCK_ORDER) + +/* + * SUPERPAGEBLOCK_NR_PAGEBLOCKS depends on pageblock_order which may be + * variable (CONFIG_HUGETLB_PAGE_SIZE_VARIABLE). + */ +#define SUPERPAGEBLOCK_NR_PAGEBLOCKS (1UL << (SUPERPAGEBLOCK_ORDER - pageblock_order)) + +struct superpageblock { + /* Pageblock counts by current migratetype */ + u16 nr_free; + u16 nr_unmovable; + u16 nr_reclaimable; + u16 nr_movable; + u16 nr_reserved; /* holes, firmware, etc. */ + + /* For organizing superpageblocks by fullness category */ + struct list_head list; + + /* Identity */ + unsigned long start_pfn; + struct zone *zone; +}; + struct zone { /* Read-mostly fields */ @@ -919,6 +956,11 @@ struct zone { struct pageblock_data *pageblock_data; #endif /* CONFIG_SPARSEMEM */ + /* Superpageblock array for 1GB anti-fragmentation tracking */ + struct superpageblock *superpageblocks; + unsigned long nr_superpageblocks; + unsigned long superpageblock_base_pfn; /* 1GB-aligned base */ + /* zone_start_pfn == zone_start_paddr >> PAGE_SHIFT */ unsigned long zone_start_pfn; @@ -1059,6 +1101,21 @@ struct zone { atomic_long_t vm_numa_event[NR_VM_NUMA_EVENT_ITEMS]; } ____cacheline_internodealigned_in_smp; +static inline struct superpageblock *pfn_to_superpageblock(struct zone *zone, + unsigned long pfn) +{ + unsigned long idx; + + if (!zone->superpageblocks) + return NULL; + + idx = (pfn - zone->superpageblock_base_pfn) >> SUPERPAGEBLOCK_ORDER; + if (idx >= zone->nr_superpageblocks) + return NULL; + + return &zone->superpageblocks[idx]; +} + enum pgdat_flags { PGDAT_WRITEBACK, /* reclaim scanning has recently found * many pages under writeback diff --git a/mm/mm_init.c b/mm/mm_init.c index b3f83452de72..1fb62342d1c6 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -1517,6 +1517,95 @@ static void __ref setup_usemap(struct zone *zone) static inline void setup_usemap(struct zone *zone) {} #endif /* CONFIG_SPARSEMEM */ +/** + * init_one_superpageblock - initialize a single superpageblock + * @sb: superpageblock to initialize + * @zone: owning zone + * @start_pfn: start PFN for this superpageblock + * @zone_start: zone start PFN (for clipping) + * @zone_end: zone end PFN (for clipping) + * + * Zero counters, compute the zone-clipped pageblock count. + * Used by both boot-time setup and memory hotplug resize. + */ +static void __meminit init_one_superpageblock(struct superpageblock *sb, + struct zone *zone, + unsigned long start_pfn, + unsigned long zone_start, + unsigned long zone_end) +{ + unsigned long sb_end = start_pfn + SUPERPAGEBLOCK_NR_PAGES; + unsigned long pb_start = max(start_pfn, zone_start); + unsigned long pb_end = min(sb_end, zone_end); + u16 actual_pbs; + + sb->nr_unmovable = 0; + sb->nr_reclaimable = 0; + sb->nr_movable = 0; + sb->nr_free = 0; + INIT_LIST_HEAD(&sb->list); + sb->start_pfn = start_pfn; + sb->zone = zone; + + /* + * Start with all pageblock slots as reserved. + * init_pageblock_migratetype() will decrement nr_reserved and + * increment the appropriate counter for each real pageblock. + * Holes and firmware-reserved regions stay counted as reserved. + * + * Only count pageblocks that fall within the zone's span. + * The first and last superpageblocks may extend beyond the + * zone boundaries. Use round-up division because a partial + * pageblock at the zone boundary still gets initialized by + * init_pageblock_migratetype(). + */ + actual_pbs = (pb_end > pb_start) ? + ((pb_end - pb_start + pageblock_nr_pages - 1) >> + pageblock_order) : 0; + sb->nr_reserved = actual_pbs; +} + +static void __init setup_superpageblocks(struct zone *zone) +{ + unsigned long zone_start = zone->zone_start_pfn; + unsigned long zone_end = zone_start + zone->spanned_pages; + unsigned long sb_base, nr_superpageblocks; + size_t alloc_size; + unsigned long i; + + zone->superpageblocks = NULL; + zone->nr_superpageblocks = 0; + zone->superpageblock_base_pfn = 0; + + if (!zone->spanned_pages) + return; + + /* + * Superpageblocks must be 1GB (PUD) aligned. Align the base down + * and the end up to cover all 1GB regions the zone spans. + */ + sb_base = ALIGN_DOWN(zone_start, SUPERPAGEBLOCK_NR_PAGES); + nr_superpageblocks = (ALIGN(zone_end, SUPERPAGEBLOCK_NR_PAGES) - sb_base) >> + SUPERPAGEBLOCK_ORDER; + + alloc_size = nr_superpageblocks * sizeof(struct superpageblock); + zone->superpageblocks = memblock_alloc_node(alloc_size, SMP_CACHE_BYTES, + zone_to_nid(zone)); + if (!zone->superpageblocks) { + pr_warn("Failed to allocate %zu bytes for zone %s superpageblocks\n", + alloc_size, zone->name); + return; + } + + zone->nr_superpageblocks = nr_superpageblocks; + zone->superpageblock_base_pfn = sb_base; + + for (i = 0; i < nr_superpageblocks; i++) + init_one_superpageblock(&zone->superpageblocks[i], zone, + sb_base + (i << SUPERPAGEBLOCK_ORDER), + zone_start, zone_end); +} + #ifdef CONFIG_HUGETLB_PAGE_SIZE_VARIABLE /* Initialise the number of pages represented by NR_PAGEBLOCK_BITS */ @@ -1625,6 +1714,7 @@ static void __init free_area_init_core(struct pglist_data *pgdat) continue; setup_usemap(zone); + setup_superpageblocks(zone); init_currently_empty_zone(zone, zone->zone_start_pfn, size); } } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d0a4de435842..a3837a30a7eb 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -501,6 +501,62 @@ void clear_pfnblock_bit(const struct page *page, unsigned long pfn, clear_bit(pb_bit, get_pfnblock_flags_word(page, pfn)); } +/* + * Map migratetype to PB_has_* bit index. Returns -1 for types that + * don't have a tracking bit (e.g. MIGRATE_ISOLATE). + */ +static inline int migratetype_to_has_bit(int migratetype) +{ + switch (migratetype) { + case MIGRATE_UNMOVABLE: + case MIGRATE_HIGHATOMIC: + return PB_has_unmovable; + case MIGRATE_RECLAIMABLE: + return PB_has_reclaimable; + case MIGRATE_MOVABLE: +#ifdef CONFIG_CMA + case MIGRATE_CMA: +#endif + return PB_has_movable; + default: + return -1; + } +} + +/* + * __spb_set_has_type - set PB_has_* and increment type counter + * + * Idempotent: only increments the counter on the 0→1 bit transition. + */ +static void __spb_set_has_type(struct page *page, int migratetype) +{ + unsigned long pfn = page_to_pfn(page); + struct superpageblock *sb = pfn_to_superpageblock(page_zone(page), pfn); + int bit; + + if (!sb) + return; + + bit = migratetype_to_has_bit(migratetype); + if (bit < 0) + return; + + if (!get_pfnblock_bit(page, pfn, bit)) { + set_pfnblock_bit(page, pfn, bit); + switch (bit) { + case PB_has_unmovable: + sb->nr_unmovable++; + break; + case PB_has_reclaimable: + sb->nr_reclaimable++; + break; + case PB_has_movable: + sb->nr_movable++; + break; + } + } +} + /** * set_pageblock_migratetype - Set the migratetype of a pageblock * @page: The page within the block of interest @@ -534,6 +590,7 @@ void __meminit init_pageblock_migratetype(struct page *page, { unsigned long pfn = page_to_pfn(page); struct pageblock_data *pbd; + struct superpageblock *sb; unsigned long flags; if (unlikely(page_group_by_mobility_disabled && @@ -557,6 +614,14 @@ void __meminit init_pageblock_migratetype(struct page *page, pbd = pfn_to_pageblock(page, pfn); pbd->block_pfn = pfn; INIT_LIST_HEAD(&pbd->cpu_node); + + /* Transition from reserved (boot default) to initial migratetype */ + sb = pfn_to_superpageblock(page_zone(page), pfn); + if (sb) { + if (sb->nr_reserved) + sb->nr_reserved--; + __spb_set_has_type(page, migratetype); + } } #ifdef CONFIG_DEBUG_VM -- 2.52.0