From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-183.mta0.migadu.com (out-183.mta0.migadu.com [91.218.175.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F28E43D7D66 for ; Tue, 16 Jun 2026 19:15:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.183 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781637311; cv=none; b=hRd4GThBrpV6DKFH9Kq5bJcw1/5F3/7TUYtfiQYhtpWRIgL62mJiewDhVhrmgcAuz2wQpSP4q+a+MQItNKfQXTLOliPxYm0p7IKNoTt9Bx6r7RXzc84dZ+OXppiUJZQrFqMwAPerUkfHPnspHh4UXx2fUmdwljBI0nNjm2SukDY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781637311; c=relaxed/simple; bh=/1dMpgYCrrwe5uBRuDQrFfbflgs95yoJw1pGuKlt6sM=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=WeEcw9gPCOa/a+HK4TLCZr1KZ77mi1d9vKNRE4Xm69NhNFlUwwwPrDsoENrBX/eu4HBCx+Hdqi1tDqjexuESTAxRHduCG7/zRJaON/35sGg05AaTswOVRLXesyH9Cmt9DgV7jQXHjQ4xaaA7JRTOJ5mMBdlAB2ArTr8SA5zKg7A= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=p2PkWtj6; arc=none smtp.client-ip=91.218.175.183 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="p2PkWtj6" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1781637305; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=1hyb9sq/pDu2vAWByIj5s/oybcaQC5FJ8ZbbzIxTBqU=; b=p2PkWtj6CKG2gyrtbPEdLk4B3wWGQLE7aaGewHGIneXFiEEa5/YFkCru72qnkH7OL4kdYP 7vEV+HDtJNT59vutmfddOYlKataaXdlEs0aCZ11ANsO+19L2UnSIJL0CjTMV4MSHtapFAp oClkG1la1mdCh3fw7YYSYl0T5qkZJdw= From: JP Kobryn To: akpm@linux-foundation.org, david@kernel.org, ljs@kernel.org, liam@infradead.org, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, mhocko@suse.com, jackmanb@google.com, hannes@cmpxchg.org, ziy@nvidia.com, fvdl@google.com, linux-mm@kvack.org Cc: shakeel.butt@linux.dev, usama.arif@linux.dev, linux-kernel@vger.kernel.org Subject: [PATCH] mm/page_alloc: use existing highatomic reserves on the buddy fastpath Date: Tue, 16 Jun 2026 12:14:20 -0700 Message-ID: <20260616191420.52556-1-jp.kobryn@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT ALLOC_HIGHATOMIC currently provides both access to MIGRATE_HIGHATOMIC free pages and permission to create new highatomic pageblock reserves. This makes it unsuitable for the fastpath. However, the fastpath can reach rmqueue_buddy() while MIGRATE_HIGHATOMIC reserves have free pages available. In this situation, the allocation can fall back to other migratetypes without trying those reserves first. Allow high-priority non-blocking allocations above order-0 and up to the costly order to use existing MIGRATE_HIGHATOMIC reserves on the buddy fastpath without granting permission to grow these reserves. Add ALLOC_HIGHATOMIC_RESERVE for allocations that may both access MIGRATE_HIGHATOMIC and grow the reserves. Change the semantics of ALLOC_HIGHATOMIC so that it may only access the reserves. A UDP receive workload was run with free MIGRATE_HIGHATOMIC pageblocks available in the target zone. Before this patch, the workload did not consume these blocks. With this patch, comparable runs consumed available blocks for 96-100% of eligible order-1 atomic allocations reaching the buddy path, with no highatomic misses observed. The workload did not grow highatomic reserves and NAPI page-frag allocations remained healthy with no failures or order-0 fallbacks. Signed-off-by: JP Kobryn --- mm/internal.h | 4 +++- mm/page_alloc.c | 34 +++++++++++++++++++++++++++------- 2 files changed, 30 insertions(+), 8 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 181e79f1d6a2..a7693a9fdd29 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1477,9 +1477,11 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone, #define ALLOC_HIGHATOMIC 0x200 /* Allows access to MIGRATE_HIGHATOMIC */ #define ALLOC_TRYLOCK 0x400 /* Only use spin_trylock in allocation path */ #define ALLOC_KSWAPD 0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */ +#define ALLOC_HIGHATOMIC_RESERVE 0x1000 /* Allows growing MIGRATE_HIGHATOMIC reserves */ /* Flags that allow allocations below the min watermark. */ -#define ALLOC_RESERVES (ALLOC_NON_BLOCK|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|ALLOC_OOM) +#define ALLOC_RESERVES (ALLOC_NON_BLOCK | ALLOC_MIN_RESERVE | \ + ALLOC_HIGHATOMIC | ALLOC_OOM | ALLOC_HIGHATOMIC_RESERVE) enum ttu_flags; struct tlbflush_unmap_batch; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index ee902a468c2f..e1c28bc0ba3f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3222,7 +3222,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, } else { spin_lock_irqsave(&zone->lock, flags); } - if (alloc_flags & ALLOC_HIGHATOMIC) + if (alloc_flags & (ALLOC_HIGHATOMIC | ALLOC_HIGHATOMIC_RESERVE)) page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC); if (!page) { enum rmqueue_mode rmqm = RMQUEUE_NORMAL; @@ -3250,7 +3250,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, * If this is a high-order atomic allocation then check * if the pageblock should be reserved for the future */ - if (unlikely(alloc_flags & ALLOC_HIGHATOMIC)) + if (unlikely(alloc_flags & ALLOC_HIGHATOMIC_RESERVE)) reserve_highatomic_pageblock(page, order, zone); __count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order); @@ -3333,9 +3333,10 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, * Instead, direct it towards the reserves by * returning NULL, which will make the caller fall * back to rmqueue_buddy. This will try to use the - * reserves first and grow them if needed. + * reserves first and grow them if permitted by + * the ALLOC_HIGHATOMIC_RESERVE flag. */ - if (alloc_flags & ALLOC_HIGHATOMIC) + if (alloc_flags & (ALLOC_HIGHATOMIC | ALLOC_HIGHATOMIC_RESERVE)) return NULL; alloced = rmqueue_bulk(zone, order, @@ -3653,7 +3654,7 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark, return true; } #endif - if ((alloc_flags & (ALLOC_HIGHATOMIC|ALLOC_OOM)) && + if ((alloc_flags & (ALLOC_HIGHATOMIC | ALLOC_HIGHATOMIC_RESERVE | ALLOC_OOM)) && !free_area_empty(area, MIGRATE_HIGHATOMIC)) { return true; } @@ -3773,6 +3774,24 @@ alloc_flags_nofragment(struct zone *zone, gfp_t gfp_mask) return alloc_flags; } +/* + * Let high-priority non-blocking allocations above order-0 and up + * to the costly order try to use existing MIGRATE_HIGHATOMIC + * reserves on the fastpath. + */ +static inline unsigned int +alloc_flags_highatomic_fastpath(gfp_t gfp_mask, unsigned int order) +{ + if (!order || order > PAGE_ALLOC_COSTLY_ORDER) + return 0; + if (!(gfp_mask & __GFP_HIGH)) + return 0; + if (gfp_mask & (__GFP_DIRECT_RECLAIM | __GFP_NOMEMALLOC)) + return 0; + + return ALLOC_HIGHATOMIC; +} + /* Must be called after current_gfp_context() which can change gfp_mask */ static inline unsigned int gfp_to_alloc_flags_cma(gfp_t gfp_mask, unsigned int alloc_flags) @@ -4504,7 +4523,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order) alloc_flags |= ALLOC_NON_BLOCK; if (order > 0 && (alloc_flags & ALLOC_MIN_RESERVE)) - alloc_flags |= ALLOC_HIGHATOMIC; + alloc_flags |= ALLOC_HIGHATOMIC_RESERVE; } /* @@ -5298,7 +5317,8 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order, * Forbid the first pass from falling back to types that fragment * memory until all local zones are considered. */ - alloc_flags |= alloc_flags_nofragment(zonelist_zone(ac.preferred_zoneref), gfp); + alloc_flags |= alloc_flags_nofragment(zonelist_zone(ac.preferred_zoneref), gfp) | + alloc_flags_highatomic_fastpath(alloc_gfp, order); /* First allocation attempt */ page = get_page_from_freelist(alloc_gfp, order, alloc_flags, &ac); -- 2.54.0