* [PATCH v2] mm/page_alloc: use existing highatomic reserves on the buddy fastpath @ 2026-06-17 23:49 JP Kobryn 2026-06-18 18:35 ` Johannes Weiner 0 siblings, 1 reply; 3+ messages in thread From: JP Kobryn @ 2026-06-17 23:49 UTC (permalink / raw) To: akpm, david, ljs, liam, vbabka, rppt, surenb, mhocko, jackmanb, hannes, ziy, fvdl, linux-mm Cc: shakeel.butt, usama.arif, linux-kernel ALLOC_HIGHATOMIC currently provides both access to MIGRATE_HIGHATOMIC free pages and permission to create new highatomic pageblock reserves. This makes it unsuitable for the fastpath. However, the fastpath can reach rmqueue_buddy() while MIGRATE_HIGHATOMIC reserves have free pages available. In this situation, the allocation can fall back to other migratetypes without trying those reserves first. Allow high-priority non-blocking allocations above order-0 and up to the costly order to use existing MIGRATE_HIGHATOMIC reserves on the buddy fastpath. Change the semantics of ALLOC_HIGHATOMIC so that it only allows access to the reserves without permission to grow them. Add a new flag ALLOC_HIGHATOMIC_RESERVE that specifically allows growing the reserves. A UDP receive workload was run with free MIGRATE_HIGHATOMIC pageblocks available in the target zone. Before this patch, the workload did not consume these blocks. With this patch, eligible order-1 allocations reaching the buddy path consumed existing MIGRATE_HIGHATOMIC pageblocks, with no highatomic misses observed. The workload did not grow highatomic reserves and NAPI page-frag allocations remained healthy with no failures or order-0 fallbacks. Signed-off-by: JP Kobryn <jp.kobryn@linux.dev> Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> --- v2: - decouple use semantics from ALLOC_HIGHATOMIC_RESERVE - update changelog to reflect above change and reword test paragraph - adjust comment in PCP path - rebase onto Linus' tree ~v7.2-rc1 v1: https://lore.kernel.org/linux-mm/20260616191420.52556-1-jp.kobryn@linux.dev/ mm/internal.h | 1 + mm/page_alloc.c | 30 +++++++++++++++++++++++++----- 2 files changed, 26 insertions(+), 5 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 5a2ddcf68e0b..6700659615e8 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1478,6 +1478,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone, #define ALLOC_HIGHATOMIC 0x200 /* Allows access to MIGRATE_HIGHATOMIC */ #define ALLOC_TRYLOCK 0x400 /* Only use spin_trylock in allocation path */ #define ALLOC_KSWAPD 0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */ +#define ALLOC_HIGHATOMIC_RESERVE 0x1000 /* Allows growing MIGRATE_HIGHATOMIC reserves */ /* Flags that allow allocations below the min watermark. */ #define ALLOC_RESERVES (ALLOC_NON_BLOCK|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|ALLOC_OOM) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d49c254174da..ed919e2ac99a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3238,7 +3238,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, * If this is a high-order atomic allocation then check * if the pageblock should be reserved for the future */ - if (unlikely(alloc_flags & ALLOC_HIGHATOMIC)) + if (unlikely(alloc_flags & ALLOC_HIGHATOMIC_RESERVE)) reserve_highatomic_pageblock(page, order, zone); __count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order); @@ -3320,8 +3320,9 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, * * Instead, direct it towards the reserves by * returning NULL, which will make the caller fall - * back to rmqueue_buddy. This will try to use the - * reserves first and grow them if needed. + * back to rmqueue_buddy. There it will try to use + * the reserves first and grow them if needed and + * permitted by the ALLOC_HIGHATOMIC_RESERVE flag. */ if (alloc_flags & ALLOC_HIGHATOMIC) return NULL; @@ -3768,6 +3769,24 @@ alloc_flags_nofragment(struct zone *zone, gfp_t gfp_mask) return alloc_flags; } +/* + * Let high-priority non-blocking allocations above order-0 and up + * to the costly order try to use existing MIGRATE_HIGHATOMIC + * reserves on the fastpath. + */ +static inline unsigned int +alloc_flags_highatomic_fastpath(gfp_t gfp_mask, unsigned int order) +{ + if (!order || order > PAGE_ALLOC_COSTLY_ORDER) + return 0; + if (!(gfp_mask & __GFP_HIGH)) + return 0; + if (gfp_mask & (__GFP_DIRECT_RECLAIM | __GFP_NOMEMALLOC)) + return 0; + + return ALLOC_HIGHATOMIC; +} + /* Must be called after current_gfp_context() which can change gfp_mask */ static inline unsigned int gfp_to_alloc_flags_cma(gfp_t gfp_mask, unsigned int alloc_flags) @@ -4495,7 +4514,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order) alloc_flags |= ALLOC_NON_BLOCK; if (order > 0 && (alloc_flags & ALLOC_MIN_RESERVE)) - alloc_flags |= ALLOC_HIGHATOMIC; + alloc_flags |= (ALLOC_HIGHATOMIC | ALLOC_HIGHATOMIC_RESERVE); } /* @@ -5215,7 +5234,8 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order, * Forbid the first pass from falling back to types that fragment * memory until all local zones are considered. */ - alloc_flags |= alloc_flags_nofragment(zonelist_zone(ac.preferred_zoneref), gfp); + alloc_flags |= alloc_flags_nofragment(zonelist_zone(ac.preferred_zoneref), gfp) | + alloc_flags_highatomic_fastpath(alloc_gfp, order); /* First allocation attempt */ page = get_page_from_freelist(alloc_gfp, order, alloc_flags, &ac); -- 2.54.0 ^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH v2] mm/page_alloc: use existing highatomic reserves on the buddy fastpath 2026-06-17 23:49 [PATCH v2] mm/page_alloc: use existing highatomic reserves on the buddy fastpath JP Kobryn @ 2026-06-18 18:35 ` Johannes Weiner 2026-06-19 21:45 ` JP Kobryn 0 siblings, 1 reply; 3+ messages in thread From: Johannes Weiner @ 2026-06-18 18:35 UTC (permalink / raw) To: JP Kobryn Cc: akpm, david, ljs, liam, vbabka, rppt, surenb, mhocko, jackmanb, ziy, fvdl, linux-mm, shakeel.butt, usama.arif, linux-kernel On Wed, Jun 17, 2026 at 04:49:58PM -0700, JP Kobryn wrote: > ALLOC_HIGHATOMIC currently provides both access to MIGRATE_HIGHATOMIC free > pages and permission to create new highatomic pageblock reserves. This > makes it unsuitable for the fastpath. > > However, the fastpath can reach rmqueue_buddy() while MIGRATE_HIGHATOMIC > reserves have free pages available. In this situation, the allocation can > fall back to other migratetypes without trying those reserves first. > > Allow high-priority non-blocking allocations above order-0 and up to the > costly order to use existing MIGRATE_HIGHATOMIC reserves on the buddy > fastpath. Change the semantics of ALLOC_HIGHATOMIC so that it only allows > access to the reserves without permission to grow them. Add a new flag > ALLOC_HIGHATOMIC_RESERVE that specifically allows growing the reserves. > > A UDP receive workload was run with free MIGRATE_HIGHATOMIC pageblocks > available in the target zone. Before this patch, the workload did not > consume these blocks. With this patch, eligible order-1 allocations > reaching the buddy path consumed existing MIGRATE_HIGHATOMIC pageblocks, > with no highatomic misses observed. The workload did not grow highatomic > reserves and NAPI page-frag allocations remained healthy with no failures > or order-0 fallbacks. Thanks for digging deeper into this! That's a great find. > Signed-off-by: JP Kobryn <jp.kobryn@linux.dev> > Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> > --- > v2: > - decouple use semantics from ALLOC_HIGHATOMIC_RESERVE > - update changelog to reflect above change and reword test paragraph > - adjust comment in PCP path > - rebase onto Linus' tree ~v7.2-rc1 > > v1: https://lore.kernel.org/linux-mm/20260616191420.52556-1-jp.kobryn@linux.dev/ > > mm/internal.h | 1 + > mm/page_alloc.c | 30 +++++++++++++++++++++++++----- > 2 files changed, 26 insertions(+), 5 deletions(-) > > diff --git a/mm/internal.h b/mm/internal.h > index 5a2ddcf68e0b..6700659615e8 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -1478,6 +1478,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone, > #define ALLOC_HIGHATOMIC 0x200 /* Allows access to MIGRATE_HIGHATOMIC */ > #define ALLOC_TRYLOCK 0x400 /* Only use spin_trylock in allocation path */ > #define ALLOC_KSWAPD 0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */ > +#define ALLOC_HIGHATOMIC_RESERVE 0x1000 /* Allows growing MIGRATE_HIGHATOMIC reserves */ > > /* Flags that allow allocations below the min watermark. */ > #define ALLOC_RESERVES (ALLOC_NON_BLOCK|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|ALLOC_OOM) > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index d49c254174da..ed919e2ac99a 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -3238,7 +3238,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, > * If this is a high-order atomic allocation then check > * if the pageblock should be reserved for the future > */ > - if (unlikely(alloc_flags & ALLOC_HIGHATOMIC)) > + if (unlikely(alloc_flags & ALLOC_HIGHATOMIC_RESERVE)) > reserve_highatomic_pageblock(page, order, zone); You could check ALLOC_WMARK_MIN to determine the slowpath. This way you wouldn't need another alloc flag: /* Slowpath (precarious) high-atomic allocation. Maybe reserve block */ if (unlikely((alloc_flags & (ALLOC_HIGHATOMIC|ALLOC_WMARK_MIN)) == (ALLOC_HIGHATOMIC|ALLOC_WMARK_MIN))) reserve_highatomic_pageblock(page, order, zone); [ we really ought to generalize gfp_has_flags() ... ] > __count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order); > @@ -3320,8 +3320,9 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, > * > * Instead, direct it towards the reserves by > * returning NULL, which will make the caller fall > - * back to rmqueue_buddy. This will try to use the > - * reserves first and grow them if needed. > + * back to rmqueue_buddy. There it will try to use > + * the reserves first and grow them if needed and > + * permitted by the ALLOC_HIGHATOMIC_RESERVE flag. > */ > if (alloc_flags & ALLOC_HIGHATOMIC) > return NULL; > @@ -3768,6 +3769,24 @@ alloc_flags_nofragment(struct zone *zone, gfp_t gfp_mask) > return alloc_flags; > } > > +/* > + * Let high-priority non-blocking allocations above order-0 and up > + * to the costly order try to use existing MIGRATE_HIGHATOMIC > + * reserves on the fastpath. > + */ > +static inline unsigned int > +alloc_flags_highatomic_fastpath(gfp_t gfp_mask, unsigned int order) > +{ > + if (!order || order > PAGE_ALLOC_COSTLY_ORDER) > + return 0; There seems to be a mismatch between this and gfp_to_alloc_flags() (slowpath), where slowpath is still allowed to tap highatomic reserves for costly orders. Is that on purpose? > + if (!(gfp_mask & __GFP_HIGH)) > + return 0; > + if (gfp_mask & (__GFP_DIRECT_RECLAIM | __GFP_NOMEMALLOC)) > + return 0; This duplicates gfp_to_alloc_flags() logic which seems fragile. How about the below: > + > + return ALLOC_HIGHATOMIC; > +} > + > /* Must be called after current_gfp_context() which can change gfp_mask */ > static inline unsigned int gfp_to_alloc_flags_cma(gfp_t gfp_mask, > unsigned int alloc_flags) > @@ -4495,7 +4514,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order) > alloc_flags |= ALLOC_NON_BLOCK; > > if (order > 0 && (alloc_flags & ALLOC_MIN_RESERVE)) > - alloc_flags |= ALLOC_HIGHATOMIC; > + alloc_flags |= (ALLOC_HIGHATOMIC | ALLOC_HIGHATOMIC_RESERVE); > } > > /* > @@ -5215,7 +5234,8 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order, > * Forbid the first pass from falling back to types that fragment > * memory until all local zones are considered. > */ > - alloc_flags |= alloc_flags_nofragment(zonelist_zone(ac.preferred_zoneref), gfp); > + alloc_flags |= alloc_flags_nofragment(zonelist_zone(ac.preferred_zoneref), gfp) | > + alloc_flags_highatomic_fastpath(alloc_gfp, order); alloc_flags |= gfp_to_alloc_flags(gfp, order) & ALLOC_HIGHATOMIC; Or factor gfp_to_alloc_flags_nonblocking() from gfp_to_alloc_flags() and reuse that here, to save a few cycles in the fast path. > /* First allocation attempt */ > page = get_page_from_freelist(alloc_gfp, order, alloc_flags, &ac); > -- > 2.54.0 ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v2] mm/page_alloc: use existing highatomic reserves on the buddy fastpath 2026-06-18 18:35 ` Johannes Weiner @ 2026-06-19 21:45 ` JP Kobryn 0 siblings, 0 replies; 3+ messages in thread From: JP Kobryn @ 2026-06-19 21:45 UTC (permalink / raw) To: Johannes Weiner Cc: akpm, david, ljs, liam, vbabka, rppt, surenb, mhocko, jackmanb, ziy, fvdl, linux-mm, shakeel.butt, usama.arif, linux-kernel On 6/18/26 11:35 AM, Johannes Weiner wrote: > On Wed, Jun 17, 2026 at 04:49:58PM -0700, JP Kobryn wrote: >> ALLOC_HIGHATOMIC currently provides both access to MIGRATE_HIGHATOMIC free >> pages and permission to create new highatomic pageblock reserves. This >> makes it unsuitable for the fastpath. >> >> However, the fastpath can reach rmqueue_buddy() while MIGRATE_HIGHATOMIC >> reserves have free pages available. In this situation, the allocation can >> fall back to other migratetypes without trying those reserves first. >> >> Allow high-priority non-blocking allocations above order-0 and up to the >> costly order to use existing MIGRATE_HIGHATOMIC reserves on the buddy >> fastpath. Change the semantics of ALLOC_HIGHATOMIC so that it only allows >> access to the reserves without permission to grow them. Add a new flag >> ALLOC_HIGHATOMIC_RESERVE that specifically allows growing the reserves. >> >> A UDP receive workload was run with free MIGRATE_HIGHATOMIC pageblocks >> available in the target zone. Before this patch, the workload did not >> consume these blocks. With this patch, eligible order-1 allocations >> reaching the buddy path consumed existing MIGRATE_HIGHATOMIC pageblocks, >> with no highatomic misses observed. The workload did not grow highatomic >> reserves and NAPI page-frag allocations remained healthy with no failures >> or order-0 fallbacks. > > Thanks for digging deeper into this! That's a great find. Thanks :) > >> Signed-off-by: JP Kobryn <jp.kobryn@linux.dev> >> Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> >> --- >> v2: >> - decouple use semantics from ALLOC_HIGHATOMIC_RESERVE >> - update changelog to reflect above change and reword test paragraph >> - adjust comment in PCP path >> - rebase onto Linus' tree ~v7.2-rc1 >> >> v1: https://lore.kernel.org/linux-mm/20260616191420.52556-1-jp.kobryn@linux.dev/ >> >> mm/internal.h | 1 + >> mm/page_alloc.c | 30 +++++++++++++++++++++++++----- >> 2 files changed, 26 insertions(+), 5 deletions(-) >> >> diff --git a/mm/internal.h b/mm/internal.h >> index 5a2ddcf68e0b..6700659615e8 100644 >> --- a/mm/internal.h >> +++ b/mm/internal.h >> @@ -1478,6 +1478,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone, >> #define ALLOC_HIGHATOMIC 0x200 /* Allows access to MIGRATE_HIGHATOMIC */ >> #define ALLOC_TRYLOCK 0x400 /* Only use spin_trylock in allocation path */ >> #define ALLOC_KSWAPD 0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */ >> +#define ALLOC_HIGHATOMIC_RESERVE 0x1000 /* Allows growing MIGRATE_HIGHATOMIC reserves */ >> >> /* Flags that allow allocations below the min watermark. */ >> #define ALLOC_RESERVES (ALLOC_NON_BLOCK|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|ALLOC_OOM) >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index d49c254174da..ed919e2ac99a 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -3238,7 +3238,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, >> * If this is a high-order atomic allocation then check >> * if the pageblock should be reserved for the future >> */ >> - if (unlikely(alloc_flags & ALLOC_HIGHATOMIC)) >> + if (unlikely(alloc_flags & ALLOC_HIGHATOMIC_RESERVE)) >> reserve_highatomic_pageblock(page, order, zone); > > You could check ALLOC_WMARK_MIN to determine the slowpath. This way > you wouldn't need another alloc flag: > > /* Slowpath (precarious) high-atomic allocation. Maybe reserve block */ > if (unlikely((alloc_flags & (ALLOC_HIGHATOMIC|ALLOC_WMARK_MIN)) == (ALLOC_HIGHATOMIC|ALLOC_WMARK_MIN))) > reserve_highatomic_pageblock(page, order, zone); Hmm so we just use watermark low vs min to determine fast/slow path. We would need to mask out the watermark bits since ALLOC_WMARK_MIN is zero, but I think the idea can work. > > [ we really ought to generalize gfp_has_flags() ... ] > >> __count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order); >> @@ -3320,8 +3320,9 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, >> * >> * Instead, direct it towards the reserves by >> * returning NULL, which will make the caller fall >> - * back to rmqueue_buddy. This will try to use the >> - * reserves first and grow them if needed. >> + * back to rmqueue_buddy. There it will try to use >> + * the reserves first and grow them if needed and >> + * permitted by the ALLOC_HIGHATOMIC_RESERVE flag. >> */ >> if (alloc_flags & ALLOC_HIGHATOMIC) >> return NULL; >> @@ -3768,6 +3769,24 @@ alloc_flags_nofragment(struct zone *zone, gfp_t gfp_mask) >> return alloc_flags; >> } >> >> +/* >> + * Let high-priority non-blocking allocations above order-0 and up >> + * to the costly order try to use existing MIGRATE_HIGHATOMIC >> + * reserves on the fastpath. >> + */ >> +static inline unsigned int >> +alloc_flags_highatomic_fastpath(gfp_t gfp_mask, unsigned int order) >> +{ >> + if (!order || order > PAGE_ALLOC_COSTLY_ORDER) >> + return 0; > > There seems to be a mismatch between this and gfp_to_alloc_flags() > (slowpath), where slowpath is still allowed to tap highatomic reserves > for costly orders. Is that on purpose? Yes, the intention was to improve allocator outcomes for frequent net allocations. So I kept the policy narrow. But I thought about this some more and agree it's worth removing the order cap. With the cap, if a costly+ order atomic allocation reaches the buddy path there may be two non-ideal outcomes: fallback to some other migratetype in the fastpath or enter the slowpath and make use of any available reserves. In the latter case, the slowpath could have been avoided altogether. So I'll make this change in v3. > >> + if (!(gfp_mask & __GFP_HIGH)) >> + return 0; >> + if (gfp_mask & (__GFP_DIRECT_RECLAIM | __GFP_NOMEMALLOC)) >> + return 0; > > This duplicates gfp_to_alloc_flags() logic which seems fragile. How > about the below: > >> + >> + return ALLOC_HIGHATOMIC; >> +} >> + >> /* Must be called after current_gfp_context() which can change gfp_mask */ >> static inline unsigned int gfp_to_alloc_flags_cma(gfp_t gfp_mask, >> unsigned int alloc_flags) >> @@ -4495,7 +4514,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order) >> alloc_flags |= ALLOC_NON_BLOCK; >> >> if (order > 0 && (alloc_flags & ALLOC_MIN_RESERVE)) >> - alloc_flags |= ALLOC_HIGHATOMIC; >> + alloc_flags |= (ALLOC_HIGHATOMIC | ALLOC_HIGHATOMIC_RESERVE); >> } >> >> /* >> @@ -5215,7 +5234,8 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order, >> * Forbid the first pass from falling back to types that fragment >> * memory until all local zones are considered. >> */ >> - alloc_flags |= alloc_flags_nofragment(zonelist_zone(ac.preferred_zoneref), gfp); >> + alloc_flags |= alloc_flags_nofragment(zonelist_zone(ac.preferred_zoneref), gfp) | >> + alloc_flags_highatomic_fastpath(alloc_gfp, order); > > alloc_flags |= gfp_to_alloc_flags(gfp, order) & ALLOC_HIGHATOMIC; > > Or factor gfp_to_alloc_flags_nonblocking() from gfp_to_alloc_flags() > and reuse that here, to save a few cycles in the fast path. This should be possible now that I'll be removing the order cap. > >> /* First allocation attempt */ >> page = get_page_from_freelist(alloc_gfp, order, alloc_flags, &ac); >> -- >> 2.54.0 ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-06-19 21:45 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-17 23:49 [PATCH v2] mm/page_alloc: use existing highatomic reserves on the buddy fastpath JP Kobryn 2026-06-18 18:35 ` Johannes Weiner 2026-06-19 21:45 ` JP Kobryn
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox