* Re: [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk()
2025-04-07 18:01 [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk() Johannes Weiner
@ 2025-04-08 17:22 ` Brendan Jackman
2025-04-08 18:50 ` Johannes Weiner
2025-04-09 8:02 ` Yunsheng Lin
` (4 subsequent siblings)
5 siblings, 1 reply; 11+ messages in thread
From: Brendan Jackman @ 2025-04-08 17:22 UTC (permalink / raw)
To: Johannes Weiner, Andrew Morton
Cc: Vlastimil Babka, Mel Gorman, Carlos Song, linux-mm, linux-kernel,
kernel test robot, stable
On Mon Apr 7, 2025 at 6:01 PM UTC, Johannes Weiner wrote:
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2194,11 +2194,11 @@ try_to_claim_block(struct zone *zone, struct page *page,
> * The use of signed ints for order and current_order is a deliberate
> * deviation from the rest of this file, to make the for loop
> * condition simpler.
> - *
> - * Return the stolen page, or NULL if none can be found.
> */
This commentary is pretty confusing now, there's a block of text that
kinda vaguely applies to the aggregate of __rmqueue_steal(),
__rmqueue_fallback() and half of __rmqueue(). I think this new code does
a better job of speaking for itself so I think we should just delete
this block comment and replace it with some more verbosity elsewhere.
> +
> +/* Try to claim a whole foreign block, take a page, expand the remainder */
Also on the commentary front, I am not a fan of "foreign" and "native":
- "Foreign" is already used in this file to mean NUMA-nonlocal.
- We already have "start" and "fallback" being used in identifiers
as adjectives to describe the mitegratetype concept.
I wouldn't say those are _better_, "native" and "foreign" might be
clearer, but it's not worth introducing inconsistency IMO.
> static __always_inline struct page *
> -__rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
> +__rmqueue_claim(struct zone *zone, int order, int start_migratetype,
> unsigned int alloc_flags)
> {
> struct free_area *area;
[pasting in more context that wasn't in the original diff..]
> /*
> * Find the largest available free page in the other list. This roughly
> * approximates finding the pageblock with the most free pages, which
> * would be too costly to do exactly.
> */
> for (current_order = MAX_PAGE_ORDER; current_order >= min_order;
> --current_order) {
IIUC we could go one step further here and also avoid repeating this
iteration? Maybe something for a separate patch though?
Anyway, the approach seems like a clear improvement, thanks. I will need
to take a closer look at it tomorrow, I've run out of brain juice today.
Here's what I got from redistributing the block comment and flipping
the terminology:
diff --git i/mm/page_alloc.c w/mm/page_alloc.c
index dfb2b3f508af..b8142d605691 100644
--- i/mm/page_alloc.c
+++ w/mm/page_alloc.c
@@ -2183,21 +2183,13 @@ try_to_claim_block(struct zone *zone, struct page *page,
}
/*
- * Try finding a free buddy page on the fallback list.
- *
- * This will attempt to claim a whole pageblock for the requested type
- * to ensure grouping of such requests in the future.
- *
- * If a whole block cannot be claimed, steal an individual page, regressing to
- * __rmqueue_smallest() logic to at least break up as little contiguity as
- * possible.
+ * Try to allocate from some fallback migratetype by claiming the entire block,
+ * i.e. converting it to the allocation's start migratetype.
*
* The use of signed ints for order and current_order is a deliberate
* deviation from the rest of this file, to make the for loop
* condition simpler.
*/
-
-/* Try to claim a whole foreign block, take a page, expand the remainder */
static __always_inline struct page *
__rmqueue_claim(struct zone *zone, int order, int start_migratetype,
unsigned int alloc_flags)
@@ -2247,7 +2239,10 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
return NULL;
}
-/* Try to steal a single page from a foreign block */
+/*
+ * Try to steal a single page from some fallback migratetype. Leave the rest of
+ * the block as its current migratetype, potentially causing fragmentation.
+ */
static __always_inline struct page *
__rmqueue_steal(struct zone *zone, int order, int start_migratetype)
{
@@ -2307,7 +2302,9 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
}
/*
- * Try the different freelists, native then foreign.
+ * First try the freelists of the requested migratetype, then try
+ * fallbacks. Roughly, each fallback stage poses more of a fragmentation
+ * risk.
*
* The fallback logic is expensive and rmqueue_bulk() calls in
* a loop with the zone->lock held, meaning the freelists are
@@ -2332,7 +2329,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
case RMQUEUE_CLAIM:
page = __rmqueue_claim(zone, order, migratetype, alloc_flags);
if (page) {
- /* Replenished native freelist, back to normal mode */
+ /* Replenished requested migratetype's freelist, back to normal mode */
*mode = RMQUEUE_NORMAL;
return page;
}
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk()
2025-04-08 17:22 ` Brendan Jackman
@ 2025-04-08 18:50 ` Johannes Weiner
2025-04-09 17:30 ` Brendan Jackman
0 siblings, 1 reply; 11+ messages in thread
From: Johannes Weiner @ 2025-04-08 18:50 UTC (permalink / raw)
To: Brendan Jackman
Cc: Andrew Morton, Vlastimil Babka, Mel Gorman, Carlos Song, linux-mm,
linux-kernel, kernel test robot, stable
On Tue, Apr 08, 2025 at 05:22:00PM +0000, Brendan Jackman wrote:
> On Mon Apr 7, 2025 at 6:01 PM UTC, Johannes Weiner wrote:
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2194,11 +2194,11 @@ try_to_claim_block(struct zone *zone, struct page *page,
> > * The use of signed ints for order and current_order is a deliberate
> > * deviation from the rest of this file, to make the for loop
> > * condition simpler.
> > - *
> > - * Return the stolen page, or NULL if none can be found.
> > */
>
> This commentary is pretty confusing now, there's a block of text that
> kinda vaguely applies to the aggregate of __rmqueue_steal(),
> __rmqueue_fallback() and half of __rmqueue(). I think this new code does
> a better job of speaking for itself so I think we should just delete
> this block comment and replace it with some more verbosity elsewhere.
I'm glad you think so, let's remove it then!
> > +/* Try to claim a whole foreign block, take a page, expand the remainder */
>
> Also on the commentary front, I am not a fan of "foreign" and "native":
>
> - "Foreign" is already used in this file to mean NUMA-nonlocal.
>
> - We already have "start" and "fallback" being used in identifiers
> as adjectives to describe the mitegratetype concept.
>
> I wouldn't say those are _better_, "native" and "foreign" might be
> clearer, but it's not worth introducing inconsistency IMO.
That's a fair point, no objection to renaming them.
> > static __always_inline struct page *
> > -__rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
> > +__rmqueue_claim(struct zone *zone, int order, int start_migratetype,
> > unsigned int alloc_flags)
> > {
> > struct free_area *area;
>
> [pasting in more context that wasn't in the original diff..]
> > /*
> > * Find the largest available free page in the other list. This roughly
> > * approximates finding the pageblock with the most free pages, which
> > * would be too costly to do exactly.
> > */
> > for (current_order = MAX_PAGE_ORDER; current_order >= min_order;
> > --current_order) {
>
> IIUC we could go one step further here and also avoid repeating this
> iteration? Maybe something for a separate patch though?
That might be worth a test, but agree this should be a separate patch.
AFAICS, in the most common configurations MAX_PAGE_ORDER is only one
step above pageblock_order or even the same. It might not be worth the
complication.
> Anyway, the approach seems like a clear improvement, thanks. I will need
> to take a closer look at it tomorrow, I've run out of brain juice today.
Much appreciate you taking a look, thanks.
> Here's what I got from redistributing the block comment and flipping
> the terminology:
>
> diff --git i/mm/page_alloc.c w/mm/page_alloc.c
> index dfb2b3f508af..b8142d605691 100644
> --- i/mm/page_alloc.c
> +++ w/mm/page_alloc.c
> @@ -2183,21 +2183,13 @@ try_to_claim_block(struct zone *zone, struct page *page,
> }
>
> /*
> - * Try finding a free buddy page on the fallback list.
> - *
> - * This will attempt to claim a whole pageblock for the requested type
> - * to ensure grouping of such requests in the future.
> - *
> - * If a whole block cannot be claimed, steal an individual page, regressing to
> - * __rmqueue_smallest() logic to at least break up as little contiguity as
> - * possible.
> + * Try to allocate from some fallback migratetype by claiming the entire block,
> + * i.e. converting it to the allocation's start migratetype.
> *
> * The use of signed ints for order and current_order is a deliberate
> * deviation from the rest of this file, to make the for loop
> * condition simpler.
> */
> -
> -/* Try to claim a whole foreign block, take a page, expand the remainder */
> static __always_inline struct page *
> __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
> unsigned int alloc_flags)
> @@ -2247,7 +2239,10 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
> return NULL;
> }
>
> -/* Try to steal a single page from a foreign block */
> +/*
> + * Try to steal a single page from some fallback migratetype. Leave the rest of
> + * the block as its current migratetype, potentially causing fragmentation.
> + */
> static __always_inline struct page *
> __rmqueue_steal(struct zone *zone, int order, int start_migratetype)
> {
> @@ -2307,7 +2302,9 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
> }
>
> /*
> - * Try the different freelists, native then foreign.
> + * First try the freelists of the requested migratetype, then try
> + * fallbacks. Roughly, each fallback stage poses more of a fragmentation
> + * risk.
How about "then try fallback modes with increasing levels of
fragmentation risk."
> * The fallback logic is expensive and rmqueue_bulk() calls in
> * a loop with the zone->lock held, meaning the freelists are
> @@ -2332,7 +2329,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
> case RMQUEUE_CLAIM:
> page = __rmqueue_claim(zone, order, migratetype, alloc_flags);
> if (page) {
> - /* Replenished native freelist, back to normal mode */
> + /* Replenished requested migratetype's freelist, back to normal mode */
> *mode = RMQUEUE_NORMAL;
This line is kind of long now. How about:
/* Replenished preferred freelist, back to normal mode */
But yeah, I like your proposed changes. Would you care to send a
proper patch?
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk()
2025-04-08 18:50 ` Johannes Weiner
@ 2025-04-09 17:30 ` Brendan Jackman
2025-04-10 8:16 ` Vlastimil Babka
0 siblings, 1 reply; 11+ messages in thread
From: Brendan Jackman @ 2025-04-09 17:30 UTC (permalink / raw)
To: Johannes Weiner
Cc: Andrew Morton, Vlastimil Babka, Mel Gorman, Carlos Song, linux-mm,
linux-kernel, kernel test robot, stable
On Tue Apr 8, 2025 at 6:50 PM UTC, Johannes Weiner wrote:
>> > /*
>> > * Find the largest available free page in the other list. This roughly
>> > * approximates finding the pageblock with the most free pages, which
>> > * would be too costly to do exactly.
>> > */
>> > for (current_order = MAX_PAGE_ORDER; current_order >= min_order;
>> > --current_order) {
>>
>> IIUC we could go one step further here and also avoid repeating this
>> iteration? Maybe something for a separate patch though?
>
> That might be worth a test, but agree this should be a separate patch.
>
> AFAICS, in the most common configurations MAX_PAGE_ORDER is only one
> step above pageblock_order or even the same. It might not be worth the
> complication.
Oh yeah, makes sense.
>> /*
>> - * Try the different freelists, native then foreign.
>> + * First try the freelists of the requested migratetype, then try
>> + * fallbacks. Roughly, each fallback stage poses more of a fragmentation
>> + * risk.
>
> How about "then try fallback modes with increasing levels of
> fragmentation risk."
Yep, nice thanks.
>> * The fallback logic is expensive and rmqueue_bulk() calls in
>> * a loop with the zone->lock held, meaning the freelists are
>> @@ -2332,7 +2329,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
>> case RMQUEUE_CLAIM:
>> page = __rmqueue_claim(zone, order, migratetype, alloc_flags);
>> if (page) {
>> - /* Replenished native freelist, back to normal mode */
>> + /* Replenished requested migratetype's freelist, back to normal mode */
>> *mode = RMQUEUE_NORMAL;
>
> This line is kind of long now. How about:
>
> /* Replenished preferred freelist, back to normal mode */
Yep, sounds good - it's still 81 characters, the rest of this file
sticks to 80 for comments, I guess I'll leave it to Andrew to decide if
that is an issue?
> But yeah, I like your proposed changes. Would you care to send a
> proper patch?
Sure, pasting below. Andrew, could you fold this in? Also, I haven't
done this style of patch sending before, please let me know if I'm doing
something to make your life difficult.
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Aside from the commen stuff fixed by the patch below:
Reviewed-by: Brendan Jackman <jackmanb@google.com>
---
From 8ff20dbb52770d082e182482d2b47e521de028d1 Mon Sep 17 00:00:00 2001
From: Brendan Jackman <jackmanb@google.com>
Date: Wed, 9 Apr 2025 17:22:14 +000
Subject: [PATCH] page_alloc: speed up fallbacks in rmqueue_bulk() - comment updates
Tidy up some terminology and redistribute commentary.
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
mm/page_alloc.c | 22 +++++++++-------------
1 file changed, 9 insertions(+), 13 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index dfb2b3f508af4..220bd0bcc38c3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2183,21 +2183,13 @@ try_to_claim_block(struct zone *zone, struct page *page,
}
/*
- * Try finding a free buddy page on the fallback list.
- *
- * This will attempt to claim a whole pageblock for the requested type
- * to ensure grouping of such requests in the future.
- *
- * If a whole block cannot be claimed, steal an individual page, regressing to
- * __rmqueue_smallest() logic to at least break up as little contiguity as
- * possible.
+ * Try to allocate from some fallback migratetype by claiming the entire block,
+ * i.e. converting it to the allocation's start migratetype.
*
* The use of signed ints for order and current_order is a deliberate
* deviation from the rest of this file, to make the for loop
* condition simpler.
*/
-
-/* Try to claim a whole foreign block, take a page, expand the remainder */
static __always_inline struct page *
__rmqueue_claim(struct zone *zone, int order, int start_migratetype,
unsigned int alloc_flags)
@@ -2247,7 +2239,10 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
return NULL;
}
-/* Try to steal a single page from a foreign block */
+/*
+ * Try to steal a single page from some fallback migratetype. Leave the rest of
+ * the block as its current migratetype, potentially causing fragmentation.
+ */
static __always_inline struct page *
__rmqueue_steal(struct zone *zone, int order, int start_migratetype)
{
@@ -2307,7 +2302,8 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
}
/*
- * Try the different freelists, native then foreign.
+ * First try the freelists of the requested migratetype, then try
+ * fallbacks modes with increasing levels of fragmentation risk.
*
* The fallback logic is expensive and rmqueue_bulk() calls in
* a loop with the zone->lock held, meaning the freelists are
@@ -2332,7 +2328,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
case RMQUEUE_CLAIM:
page = __rmqueue_claim(zone, order, migratetype, alloc_flags);
if (page) {
- /* Replenished native freelist, back to normal mode */
+ /* Replenished preferred freelist, back to normal mode. */
*mode = RMQUEUE_NORMAL;
return page;
}
base-commit: aa42382db4e2a4ed1f4ba97ffc50e2ce45accb0c
--
2.49.0.504.g3bcea36a83-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk()
2025-04-09 17:30 ` Brendan Jackman
@ 2025-04-10 8:16 ` Vlastimil Babka
0 siblings, 0 replies; 11+ messages in thread
From: Vlastimil Babka @ 2025-04-10 8:16 UTC (permalink / raw)
To: Brendan Jackman, Johannes Weiner
Cc: Andrew Morton, Mel Gorman, Carlos Song, linux-mm, linux-kernel,
kernel test robot, stable
On 4/9/25 19:30, Brendan Jackman wrote:
>
> From 8ff20dbb52770d082e182482d2b47e521de028d1 Mon Sep 17 00:00:00 2001
> From: Brendan Jackman <jackmanb@google.com>
> Date: Wed, 9 Apr 2025 17:22:14 +000
> Subject: [PATCH] page_alloc: speed up fallbacks in rmqueue_bulk() - comment updates
>
> Tidy up some terminology and redistribute commentary.
> Signed-off-by: Brendan Jackman <jackmanb@google.com>
LGTM (assuming folding)
> ---
> mm/page_alloc.c | 22 +++++++++-------------
> 1 file changed, 9 insertions(+), 13 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index dfb2b3f508af4..220bd0bcc38c3 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2183,21 +2183,13 @@ try_to_claim_block(struct zone *zone, struct page *page,
> }
>
> /*
> - * Try finding a free buddy page on the fallback list.
> - *
> - * This will attempt to claim a whole pageblock for the requested type
> - * to ensure grouping of such requests in the future.
> - *
> - * If a whole block cannot be claimed, steal an individual page, regressing to
> - * __rmqueue_smallest() logic to at least break up as little contiguity as
> - * possible.
> + * Try to allocate from some fallback migratetype by claiming the entire block,
> + * i.e. converting it to the allocation's start migratetype.
> *
> * The use of signed ints for order and current_order is a deliberate
> * deviation from the rest of this file, to make the for loop
> * condition simpler.
> */
> -
> -/* Try to claim a whole foreign block, take a page, expand the remainder */
> static __always_inline struct page *
> __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
> unsigned int alloc_flags)
> @@ -2247,7 +2239,10 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
> return NULL;
> }
>
> -/* Try to steal a single page from a foreign block */
> +/*
> + * Try to steal a single page from some fallback migratetype. Leave the rest of
> + * the block as its current migratetype, potentially causing fragmentation.
> + */
> static __always_inline struct page *
> __rmqueue_steal(struct zone *zone, int order, int start_migratetype)
> {
> @@ -2307,7 +2302,8 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
> }
>
> /*
> - * Try the different freelists, native then foreign.
> + * First try the freelists of the requested migratetype, then try
> + * fallbacks modes with increasing levels of fragmentation risk.
> *
> * The fallback logic is expensive and rmqueue_bulk() calls in
> * a loop with the zone->lock held, meaning the freelists are
> @@ -2332,7 +2328,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
> case RMQUEUE_CLAIM:
> page = __rmqueue_claim(zone, order, migratetype, alloc_flags);
> if (page) {
> - /* Replenished native freelist, back to normal mode */
> + /* Replenished preferred freelist, back to normal mode. */
> *mode = RMQUEUE_NORMAL;
> return page;
> }
>
> base-commit: aa42382db4e2a4ed1f4ba97ffc50e2ce45accb0c
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk()
2025-04-07 18:01 [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk() Johannes Weiner
2025-04-08 17:22 ` Brendan Jackman
@ 2025-04-09 8:02 ` Yunsheng Lin
2025-04-09 14:00 ` Johannes Weiner
2025-04-10 2:02 ` Zi Yan
` (3 subsequent siblings)
5 siblings, 1 reply; 11+ messages in thread
From: Yunsheng Lin @ 2025-04-09 8:02 UTC (permalink / raw)
To: Johannes Weiner, Andrew Morton
Cc: Vlastimil Babka, Brendan Jackman, Mel Gorman, Carlos Song,
linux-mm, linux-kernel, kernel test robot, stable
On 2025/4/8 2:01, Johannes Weiner wrote:
...
>
> @@ -2934,6 +2981,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
> {
> struct page *page;
> unsigned long flags;
> + enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
>
> do {
> page = NULL;
> @@ -2945,7 +2993,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
> if (alloc_flags & ALLOC_HIGHATOMIC)
> page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
> if (!page) {
> - page = __rmqueue(zone, order, migratetype, alloc_flags);
> + page = __rmqueue(zone, order, migratetype, alloc_flags, &rmqm);
>
> /*
> * If the allocation fails, allow OOM handling and
It was not in the diff, but it seems the zone->lock is held inside the do..while loop,
doesn't it mean that the freelists are subject to outside changes and rmqm is stale?
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk()
2025-04-09 8:02 ` Yunsheng Lin
@ 2025-04-09 14:00 ` Johannes Weiner
0 siblings, 0 replies; 11+ messages in thread
From: Johannes Weiner @ 2025-04-09 14:00 UTC (permalink / raw)
To: Yunsheng Lin
Cc: Andrew Morton, Vlastimil Babka, Brendan Jackman, Mel Gorman,
Carlos Song, linux-mm, linux-kernel, kernel test robot, stable
On Wed, Apr 09, 2025 at 04:02:39PM +0800, Yunsheng Lin wrote:
> On 2025/4/8 2:01, Johannes Weiner wrote:
> > @@ -2934,6 +2981,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
> > {
> > struct page *page;
> > unsigned long flags;
> > + enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
> >
> > do {
> > page = NULL;
> > @@ -2945,7 +2993,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
> > if (alloc_flags & ALLOC_HIGHATOMIC)
> > page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
> > if (!page) {
> > - page = __rmqueue(zone, order, migratetype, alloc_flags);
> > + page = __rmqueue(zone, order, migratetype, alloc_flags, &rmqm);
> >
> > /*
> > * If the allocation fails, allow OOM handling and
>
> It was not in the diff, but it seems the zone->lock is held inside the do..while loop,
> doesn't it mean that the freelists are subject to outside changes and rmqm is stale?
Yes. Note that it only loops when there is a bug/corrupted page, so it
won't make much difference in practice. But it's still kind of weird.
Thanks for your review, Yunsheng!
Andrew, could you please fold the below fixlet?
---
From 71b1eea7ded41c1f674909f9755c23b9ee9bcb6a Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Wed, 9 Apr 2025 09:56:52 -0400
Subject: [PATCH] mm: page_alloc: speed up fallbacks in rmqueue_bulk() fix
reset rmqueue_mode in rmqueue_buddy() error loop, per Yunsheng Lin
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
mm/page_alloc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index dfb2b3f508af..7ffeeb0f62d3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2983,7 +2983,6 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
{
struct page *page;
unsigned long flags;
- enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
do {
page = NULL;
@@ -2996,6 +2995,8 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
if (alloc_flags & ALLOC_HIGHATOMIC)
page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
if (!page) {
+ enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
+
page = __rmqueue(zone, order, migratetype, alloc_flags, &rmqm);
/*
--
2.49.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk()
2025-04-07 18:01 [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk() Johannes Weiner
2025-04-08 17:22 ` Brendan Jackman
2025-04-09 8:02 ` Yunsheng Lin
@ 2025-04-10 2:02 ` Zi Yan
2025-04-10 7:03 ` [EXT] " Carlos Song
` (2 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: Zi Yan @ 2025-04-10 2:02 UTC (permalink / raw)
To: Johannes Weiner, Andrew Morton
Cc: Vlastimil Babka, Brendan Jackman, Mel Gorman, Carlos Song,
linux-mm, linux-kernel, kernel test robot, stable
On Mon Apr 7, 2025 at 2:01 PM EDT, Johannes Weiner wrote:
> The test robot identified c2f6ea38fc1b ("mm: page_alloc: don't steal
> single pages from biggest buddy") as the root cause of a 56.4%
> regression in vm-scalability::lru-file-mmap-read.
>
> Carlos reports an earlier patch, c0cd6f557b90 ("mm: page_alloc: fix
> freelist movement during block conversion"), as the root cause for a
> regression in worst-case zone->lock+irqoff hold times.
>
> Both of these patches modify the page allocator's fallback path to be
> less greedy in an effort to stave off fragmentation. The flip side of
> this is that fallbacks are also less productive each time around,
> which means the fallback search can run much more frequently.
>
> Carlos' traces point to rmqueue_bulk() specifically, which tries to
> refill the percpu cache by allocating a large batch of pages in a
> loop. It highlights how once the native freelists are exhausted, the
> fallback code first scans orders top-down for whole blocks to claim,
> then falls back to a bottom-up search for the smallest buddy to steal.
> For the next batch page, it goes through the same thing again.
>
> This can be made more efficient. Since rmqueue_bulk() holds the
> zone->lock over the entire batch, the freelists are not subject to
> outside changes; when the search for a block to claim has already
> failed, there is no point in trying again for the next page.
>
> Modify __rmqueue() to remember the last successful fallback mode, and
> restart directly from there on the next rmqueue_bulk() iteration.
>
> Oliver confirms that this improves beyond the regression that the test
> robot reported against c2f6ea38fc1b:
>
> commit:
> f3b92176f4 ("tools/selftests: add guard region test for /proc/$pid/pagemap")
> c2f6ea38fc ("mm: page_alloc: don't steal single pages from biggest buddy")
> acc4d5ff0b ("Merge tag 'net-6.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
> 2c847f27c3 ("mm: page_alloc: speed up fallbacks in rmqueue_bulk()") <--- your patch
>
> f3b92176f4f7100f c2f6ea38fc1b640aa7a2e155cc1 acc4d5ff0b61eb1715c498b6536 2c847f27c37da65a93d23c237c5
> ---------------- --------------------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev %change %stddev
> \ | \ | \ | \
> 25525364 ± 3% -56.4% 11135467 -57.8% 10779336 +31.6% 33581409 vm-scalability.throughput
>
> Carlos confirms that worst-case times are almost fully recovered
> compared to before the earlier culprit patch:
>
> 2dd482ba627d (before freelist hygiene): 1ms
> c0cd6f557b90 (after freelist hygiene): 90ms
> next-20250319 (steal smallest buddy): 280ms
> this patch : 8ms
>
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Reported-by: Carlos Song <carlos.song@nxp.com>
> Tested-by: kernel test robot <oliver.sang@intel.com>
> Fixes: c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block conversion")
> Fixes: c2f6ea38fc1b ("mm: page_alloc: don't steal single pages from biggest buddy")
> Closes: https://lore.kernel.org/oe-lkp/202503271547.fc08b188-lkp@intel.com
> Cc: stable@vger.kernel.org # 6.10+
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
> mm/page_alloc.c | 100 +++++++++++++++++++++++++++++++++++-------------
> 1 file changed, 74 insertions(+), 26 deletions(-)
>
It is a really nice cleanup. It improves my understanding of the rmqueue*()
and the whole flow a lot. Thank you for the patch.
Acked-by: Zi Yan <ziy@nvidia.com>
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 11+ messages in thread* RE: [EXT] [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk()
2025-04-07 18:01 [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk() Johannes Weiner
` (2 preceding siblings ...)
2025-04-10 2:02 ` Zi Yan
@ 2025-04-10 7:03 ` Carlos Song
2025-04-10 8:12 ` Vlastimil Babka
2025-04-10 10:48 ` Shivank Garg
5 siblings, 0 replies; 11+ messages in thread
From: Carlos Song @ 2025-04-10 7:03 UTC (permalink / raw)
To: Johannes Weiner, Andrew Morton
Cc: Vlastimil Babka, Brendan Jackman, Mel Gorman, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, kernel test robot,
stable@vger.kernel.org
Hi,
That is a nice fix! I test it at imx7d.
Tested-by: Carlos Song <carlos.song@nxp.com>
> -----Original Message-----
> From: Johannes Weiner <hannes@cmpxchg.org>
> Sent: Tuesday, April 8, 2025 2:02 AM
> To: Andrew Morton <akpm@linux-foundation.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>; Brendan Jackman
> <jackmanb@google.com>; Mel Gorman <mgorman@techsingularity.net>;
> Carlos Song <carlos.song@nxp.com>; linux-mm@kvack.org;
> linux-kernel@vger.kernel.org; kernel test robot <oliver.sang@intel.com>;
> stable@vger.kernel.org
> Subject: [EXT] [PATCH 1/2] mm: page_alloc: speed up fallbacks in
> rmqueue_bulk()
>
> Caution: This is an external email. Please take care when clicking links or
> opening attachments. When in doubt, report the message using the 'Report this
> email' button
>
>
> The test robot identified c2f6ea38fc1b ("mm: page_alloc: don't steal single
> pages from biggest buddy") as the root cause of a 56.4% regression in
> vm-scalability::lru-file-mmap-read.
>
> Carlos reports an earlier patch, c0cd6f557b90 ("mm: page_alloc: fix freelist
> movement during block conversion"), as the root cause for a regression in
> worst-case zone->lock+irqoff hold times.
>
> Both of these patches modify the page allocator's fallback path to be less greedy
> in an effort to stave off fragmentation. The flip side of this is that fallbacks are
> also less productive each time around, which means the fallback search can run
> much more frequently.
>
> Carlos' traces point to rmqueue_bulk() specifically, which tries to refill the
> percpu cache by allocating a large batch of pages in a loop. It highlights how
> once the native freelists are exhausted, the fallback code first scans orders
> top-down for whole blocks to claim, then falls back to a bottom-up search for the
> smallest buddy to steal.
> For the next batch page, it goes through the same thing again.
>
> This can be made more efficient. Since rmqueue_bulk() holds the
> zone->lock over the entire batch, the freelists are not subject to
> outside changes; when the search for a block to claim has already failed, there is
> no point in trying again for the next page.
>
> Modify __rmqueue() to remember the last successful fallback mode, and restart
> directly from there on the next rmqueue_bulk() iteration.
>
> Oliver confirms that this improves beyond the regression that the test robot
> reported against c2f6ea38fc1b:
>
> commit:
> f3b92176f4 ("tools/selftests: add guard region test for /proc/$pid/pagemap")
> c2f6ea38fc ("mm: page_alloc: don't steal single pages from biggest buddy")
> acc4d5ff0b ("Merge tag 'net-6.15-rc0' of
> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
> 2c847f27c3 ("mm: page_alloc: speed up fallbacks in rmqueue_bulk()") <---
> your patch
>
> f3b92176f4f7100f c2f6ea38fc1b640aa7a2e155cc1
> acc4d5ff0b61eb1715c498b6536 2c847f27c37da65a93d23c237c5
> ---------------- --------------------------- --------------------------- ---------------------------
> %stddev %change %stddev %change
> %stddev %change %stddev
> \ | \ |
> \ | \
> 25525364 ± 3% -56.4% 11135467 -57.8%
> 10779336 +31.6% 33581409
> vm-scalability.throughput
>
> Carlos confirms that worst-case times are almost fully recovered compared to
> before the earlier culprit patch:
>
> 2dd482ba627d (before freelist hygiene): 1ms
> c0cd6f557b90 (after freelist hygiene): 90ms
> next-20250319 (steal smallest buddy): 280ms
> this patch : 8ms
>
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Reported-by: Carlos Song <carlos.song@nxp.com>
> Tested-by: kernel test robot <oliver.sang@intel.com>
> Fixes: c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block
> conversion")
> Fixes: c2f6ea38fc1b ("mm: page_alloc: don't steal single pages from biggest
> buddy")
> Closes:
> https://lore.kern/
> el.org%2Foe-lkp%2F202503271547.fc08b188-lkp%40intel.com&data=05%7C02
> %7Ccarlos.song%40nxp.com%7C3325bfa02b7942cca36508dd75fe4a3d%7C686
> ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C638796457225141027%7CUn
> known%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCI
> sIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdat
> a=gl5xQ8OLJNIccLDsYgVejUC%2B9ZQrxmt%2FxkfQXsmDuNY%3D&reserved=0
> Cc: stable@vger.kernel.org # 6.10+
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
> mm/page_alloc.c | 100 +++++++++++++++++++++++++++++++++++-------------
> 1 file changed, 74 insertions(+), 26 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c index
> f51aa6051a99..03b0d45ed45a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2194,11 +2194,11 @@ try_to_claim_block(struct zone *zone, struct page
> *page,
> * The use of signed ints for order and current_order is a deliberate
> * deviation from the rest of this file, to make the for loop
> * condition simpler.
> - *
> - * Return the stolen page, or NULL if none can be found.
> */
> +
> +/* Try to claim a whole foreign block, take a page, expand the
> +remainder */
> static __always_inline struct page *
> -__rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
> +__rmqueue_claim(struct zone *zone, int order, int start_migratetype,
> unsigned int
> alloc_flags) {
> struct free_area *area;
> @@ -2236,14 +2236,26 @@ __rmqueue_fallback(struct zone *zone, int order,
> int start_migratetype,
> page = try_to_claim_block(zone, page, current_order, order,
> start_migratetype,
> fallback_mt,
> alloc_flags);
> - if (page)
> - goto got_one;
> + if (page) {
> + trace_mm_page_alloc_extfrag(page, order,
> current_order,
> +
> start_migratetype, fallback_mt);
> + return page;
> + }
> }
>
> - if (alloc_flags & ALLOC_NOFRAGMENT)
> - return NULL;
> + return NULL;
> +}
> +
> +/* Try to steal a single page from a foreign block */ static
> +__always_inline struct page * __rmqueue_steal(struct zone *zone, int
> +order, int start_migratetype) {
> + struct free_area *area;
> + int current_order;
> + struct page *page;
> + int fallback_mt;
> + bool claim_block;
>
> - /* No luck claiming pageblock. Find the smallest fallback page */
> for (current_order = order; current_order < NR_PAGE_ORDERS;
> current_order++) {
> area = &(zone->free_area[current_order]);
> fallback_mt = find_suitable_fallback(area, current_order,
> @@ -2253,25 +2265,28 @@ __rmqueue_fallback(struct zone *zone, int order,
> int start_migratetype,
>
> page = get_page_from_free_area(area, fallback_mt);
> page_del_and_expand(zone, page, order, current_order,
> fallback_mt);
> - goto got_one;
> + trace_mm_page_alloc_extfrag(page, order, current_order,
> + start_migratetype,
> fallback_mt);
> + return page;
> }
>
> return NULL;
> -
> -got_one:
> - trace_mm_page_alloc_extfrag(page, order, current_order,
> - start_migratetype, fallback_mt);
> -
> - return page;
> }
>
> +enum rmqueue_mode {
> + RMQUEUE_NORMAL,
> + RMQUEUE_CMA,
> + RMQUEUE_CLAIM,
> + RMQUEUE_STEAL,
> +};
> +
> /*
> * Do the hard work of removing an element from the buddy allocator.
> * Call me with the zone->lock already held.
> */
> static __always_inline struct page *
> __rmqueue(struct zone *zone, unsigned int order, int migratetype,
> - unsigned int
> alloc_flags)
> + unsigned int alloc_flags, enum rmqueue_mode *mode)
> {
> struct page *page;
>
> @@ -2290,16 +2305,47 @@ __rmqueue(struct zone *zone, unsigned int order,
> int migratetype,
> }
> }
>
> - page = __rmqueue_smallest(zone, order, migratetype);
> - if (unlikely(!page)) {
> - if (alloc_flags & ALLOC_CMA)
> + /*
> + * Try the different freelists, native then foreign.
> + *
> + * The fallback logic is expensive and rmqueue_bulk() calls in
> + * a loop with the zone->lock held, meaning the freelists are
> + * not subject to any outside changes. Remember in *mode where
> + * we found pay dirt, to save us the search on the next call.
> + */
> + switch (*mode) {
> + case RMQUEUE_NORMAL:
> + page = __rmqueue_smallest(zone, order, migratetype);
> + if (page)
> + return page;
> + fallthrough;
> + case RMQUEUE_CMA:
> + if (alloc_flags & ALLOC_CMA) {
> page = __rmqueue_cma_fallback(zone, order);
> -
> - if (!page)
> - page = __rmqueue_fallback(zone, order,
> migratetype,
> - alloc_flags);
> + if (page) {
> + *mode = RMQUEUE_CMA;
> + return page;
> + }
> + }
> + fallthrough;
> + case RMQUEUE_CLAIM:
> + page = __rmqueue_claim(zone, order, migratetype,
> alloc_flags);
> + if (page) {
> + /* Replenished native freelist, back to normal
> mode */
> + *mode = RMQUEUE_NORMAL;
> + return page;
> + }
> + fallthrough;
> + case RMQUEUE_STEAL:
> + if (!(alloc_flags & ALLOC_NOFRAGMENT)) {
> + page = __rmqueue_steal(zone, order,
> migratetype);
> + if (page) {
> + *mode = RMQUEUE_STEAL;
> + return page;
> + }
> + }
> }
> - return page;
> + return NULL;
> }
>
> /*
> @@ -2311,6 +2357,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned
> int order,
> unsigned long count, struct list_head *list,
> int migratetype, unsigned int alloc_flags) {
> + enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
> unsigned long flags;
> int i;
>
> @@ -2321,7 +2368,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned
> int order,
> }
> for (i = 0; i < count; ++i) {
> struct page *page = __rmqueue(zone, order, migratetype,
> -
> alloc_flags);
> + alloc_flags, &rmqm);
> if (unlikely(page == NULL))
> break;
>
> @@ -2934,6 +2981,7 @@ struct page *rmqueue_buddy(struct zone
> *preferred_zone, struct zone *zone, {
> struct page *page;
> unsigned long flags;
> + enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
>
> do {
> page = NULL;
> @@ -2945,7 +2993,7 @@ struct page *rmqueue_buddy(struct zone
> *preferred_zone, struct zone *zone,
> if (alloc_flags & ALLOC_HIGHATOMIC)
> page = __rmqueue_smallest(zone, order,
> MIGRATE_HIGHATOMIC);
> if (!page) {
> - page = __rmqueue(zone, order, migratetype,
> alloc_flags);
> + page = __rmqueue(zone, order, migratetype,
> + alloc_flags, &rmqm);
>
> /*
> * If the allocation fails, allow OOM handling and
> --
> 2.49.0
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk()
2025-04-07 18:01 [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk() Johannes Weiner
` (3 preceding siblings ...)
2025-04-10 7:03 ` [EXT] " Carlos Song
@ 2025-04-10 8:12 ` Vlastimil Babka
2025-04-10 10:48 ` Shivank Garg
5 siblings, 0 replies; 11+ messages in thread
From: Vlastimil Babka @ 2025-04-10 8:12 UTC (permalink / raw)
To: Johannes Weiner, Andrew Morton
Cc: Brendan Jackman, Mel Gorman, Carlos Song, linux-mm, linux-kernel,
kernel test robot, stable
On 4/7/25 20:01, Johannes Weiner wrote:
> The test robot identified c2f6ea38fc1b ("mm: page_alloc: don't steal
> single pages from biggest buddy") as the root cause of a 56.4%
> regression in vm-scalability::lru-file-mmap-read.
>
> Carlos reports an earlier patch, c0cd6f557b90 ("mm: page_alloc: fix
> freelist movement during block conversion"), as the root cause for a
> regression in worst-case zone->lock+irqoff hold times.
>
> Both of these patches modify the page allocator's fallback path to be
> less greedy in an effort to stave off fragmentation. The flip side of
> this is that fallbacks are also less productive each time around,
> which means the fallback search can run much more frequently.
>
> Carlos' traces point to rmqueue_bulk() specifically, which tries to
> refill the percpu cache by allocating a large batch of pages in a
> loop. It highlights how once the native freelists are exhausted, the
> fallback code first scans orders top-down for whole blocks to claim,
> then falls back to a bottom-up search for the smallest buddy to steal.
> For the next batch page, it goes through the same thing again.
>
> This can be made more efficient. Since rmqueue_bulk() holds the
> zone->lock over the entire batch, the freelists are not subject to
> outside changes; when the search for a block to claim has already
> failed, there is no point in trying again for the next page.
>
> Modify __rmqueue() to remember the last successful fallback mode, and
> restart directly from there on the next rmqueue_bulk() iteration.
>
> Oliver confirms that this improves beyond the regression that the test
> robot reported against c2f6ea38fc1b:
>
> commit:
> f3b92176f4 ("tools/selftests: add guard region test for /proc/$pid/pagemap")
> c2f6ea38fc ("mm: page_alloc: don't steal single pages from biggest buddy")
> acc4d5ff0b ("Merge tag 'net-6.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
> 2c847f27c3 ("mm: page_alloc: speed up fallbacks in rmqueue_bulk()") <--- your patch
>
> f3b92176f4f7100f c2f6ea38fc1b640aa7a2e155cc1 acc4d5ff0b61eb1715c498b6536 2c847f27c37da65a93d23c237c5
> ---------------- --------------------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev %change %stddev
> \ | \ | \ | \
> 25525364 ± 3% -56.4% 11135467 -57.8% 10779336 +31.6% 33581409 vm-scalability.throughput
>
> Carlos confirms that worst-case times are almost fully recovered
> compared to before the earlier culprit patch:
>
> 2dd482ba627d (before freelist hygiene): 1ms
> c0cd6f557b90 (after freelist hygiene): 90ms
> next-20250319 (steal smallest buddy): 280ms
> this patch : 8ms
>
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Reported-by: Carlos Song <carlos.song@nxp.com>
> Tested-by: kernel test robot <oliver.sang@intel.com>
> Fixes: c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block conversion")
> Fixes: c2f6ea38fc1b ("mm: page_alloc: don't steal single pages from biggest buddy")
> Closes: https://lore.kernel.org/oe-lkp/202503271547.fc08b188-lkp@intel.com
> Cc: stable@vger.kernel.org # 6.10+
Might be going to be fun with dependency commits.
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cool stuff.
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk()
2025-04-07 18:01 [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk() Johannes Weiner
` (4 preceding siblings ...)
2025-04-10 8:12 ` Vlastimil Babka
@ 2025-04-10 10:48 ` Shivank Garg
5 siblings, 0 replies; 11+ messages in thread
From: Shivank Garg @ 2025-04-10 10:48 UTC (permalink / raw)
To: Johannes Weiner, Andrew Morton
Cc: Vlastimil Babka, Brendan Jackman, Mel Gorman, Carlos Song,
linux-mm, linux-kernel, kernel test robot, stable
On 4/7/2025 11:31 PM, Johannes Weiner wrote:
> The test robot identified c2f6ea38fc1b ("mm: page_alloc: don't steal
> single pages from biggest buddy") as the root cause of a 56.4%
> regression in vm-scalability::lru-file-mmap-read.
>
> Carlos reports an earlier patch, c0cd6f557b90 ("mm: page_alloc: fix
> freelist movement during block conversion"), as the root cause for a
> regression in worst-case zone->lock+irqoff hold times.
>
> Both of these patches modify the page allocator's fallback path to be
> less greedy in an effort to stave off fragmentation. The flip side of
> this is that fallbacks are also less productive each time around,
> which means the fallback search can run much more frequently.
>
> Carlos' traces point to rmqueue_bulk() specifically, which tries to
> refill the percpu cache by allocating a large batch of pages in a
> loop. It highlights how once the native freelists are exhausted, the
> fallback code first scans orders top-down for whole blocks to claim,
> then falls back to a bottom-up search for the smallest buddy to steal.
> For the next batch page, it goes through the same thing again.
>
> This can be made more efficient. Since rmqueue_bulk() holds the
> zone->lock over the entire batch, the freelists are not subject to
> outside changes; when the search for a block to claim has already
> failed, there is no point in trying again for the next page.
>
> Modify __rmqueue() to remember the last successful fallback mode, and
> restart directly from there on the next rmqueue_bulk() iteration.
>
> Oliver confirms that this improves beyond the regression that the test
> robot reported against c2f6ea38fc1b:
>
> commit:
> f3b92176f4 ("tools/selftests: add guard region test for /proc/$pid/pagemap")
> c2f6ea38fc ("mm: page_alloc: don't steal single pages from biggest buddy")
> acc4d5ff0b ("Merge tag 'net-6.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
> 2c847f27c3 ("mm: page_alloc: speed up fallbacks in rmqueue_bulk()") <--- your patch
>
> f3b92176f4f7100f c2f6ea38fc1b640aa7a2e155cc1 acc4d5ff0b61eb1715c498b6536 2c847f27c37da65a93d23c237c5
> ---------------- --------------------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev %change %stddev
> \ | \ | \ | \
> 25525364 ± 3% -56.4% 11135467 -57.8% 10779336 +31.6% 33581409 vm-scalability.throughput
>
> Carlos confirms that worst-case times are almost fully recovered
> compared to before the earlier culprit patch:
>
> 2dd482ba627d (before freelist hygiene): 1ms
> c0cd6f557b90 (after freelist hygiene): 90ms
> next-20250319 (steal smallest buddy): 280ms
> this patch : 8ms
>
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Reported-by: Carlos Song <carlos.song@nxp.com>
> Tested-by: kernel test robot <oliver.sang@intel.com>
> Fixes: c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block conversion")
> Fixes: c2f6ea38fc1b ("mm: page_alloc: don't steal single pages from biggest buddy")
> Closes: https://lore.kernel.org/oe-lkp/202503271547.fc08b188-lkp@intel.com
> Cc: stable@vger.kernel.org # 6.10+
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
Tested on AMD Zen 3 EPYC (2-socket and 1 NUMA node, 64 CPUs on each socket)
vm-scalability/300s-lru-file-mmap-read.
Vanilla Patched Change
Throughput 32267451 36112127 +11.9% improvement
Stddev% 2477.36 2969.18 +19.8%
Free Time 0.144072 0.148774 +3.2%
Median 227967 249851 +9.6%
Tested-by: Shivank Garg <shivankg@amd.com>
Thanks,
Shivank
^ permalink raw reply [flat|nested] 11+ messages in thread