[RFC PATCH] page_alloc: allow migration of smaller hugepages during contig

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH] page_alloc: allow migration of smaller hugepages during contig_alloc.
@ 2025-10-20 17:06 Gregory Price
  2025-10-20 17:24 ` David Hildenbrand
  0 siblings, 1 reply; 11+ messages in thread
From: Gregory Price @ 2025-10-20 17:06 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, kernel-team, akpm, vbabka, surenb, mhocko, jackmanb,
	hannes, ziy, David Hildenbrand

We presently skip regions with hugepages entirely when trying to do
contiguous page allocation.  Instead, if hugepage migration is enabled,
consider regions with hugepages smaller than the target contiguous
allocation request as valid targets for allocation.

Compaction `isolate_migrate_pages_block()` already expects requests
with hugepages to originate from alloc_contig, and hugetlb code also
does a migratable check when isolating in `folio_isolate_hugetlb()`.

We add the migration check here to avoid calling compaction on a
region if we know migration is not possible at all.

Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Gregory Price <gourry@gourry.net>
---
 mm/page_alloc.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 600d9e981c23..e0760eafe032 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7048,8 +7048,14 @@ static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
 		if (PageReserved(page))
 			return false;

-		if (PageHuge(page))
-			return false;
+		if (PageHuge(page)) {
+			struct folio *folio = page_folio(page);
+
+			/* Don't consider moving same size/larger pages */
+			if (!folio_test_hugetlb_migratable(folio) ||
+			    (1 << folio_order(folio) >= nr_pages))
+				return false;
+		}
 	}
 	return true;
 }
-- 
2.51.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] page_alloc: allow migration of smaller hugepages during contig_alloc.
  2025-10-20 17:06 [RFC PATCH] page_alloc: allow migration of smaller hugepages during contig_alloc Gregory Price
@ 2025-10-20 17:24 ` David Hildenbrand
  2025-10-20 17:41   ` Gregory Price
  0 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2025-10-20 17:24 UTC (permalink / raw)
  To: Gregory Price, linux-mm
  Cc: linux-kernel, kernel-team, akpm, vbabka, surenb, mhocko, jackmanb,
	hannes, ziy

On 20.10.25 19:06, Gregory Price wrote:
> We presently skip regions with hugepages entirely when trying to do
> contiguous page allocation.  Instead, if hugepage migration is enabled,
> consider regions with hugepages smaller than the target contiguous
> allocation request as valid targets for allocation.
> 
> Compaction `isolate_migrate_pages_block()` already expects requests
> with hugepages to originate from alloc_contig, and hugetlb code also
> does a migratable check when isolating in `folio_isolate_hugetlb()`.
> 
> We add the migration check here to avoid calling compaction on a
> region if we know migration is not possible at all.
> 
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Gregory Price <gourry@gourry.net>
> ---
>   mm/page_alloc.c | 10 ++++++++--
>   1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 600d9e981c23..e0760eafe032 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7048,8 +7048,14 @@ static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
>   		if (PageReserved(page))
>   			return false;
>   
> -		if (PageHuge(page))
> -			return false;
> +		if (PageHuge(page)) {
> +			struct folio *folio = page_folio(page);
> +
> +			/* Don't consider moving same size/larger pages */
> +			if (!folio_test_hugetlb_migratable(folio) ||
> +			    (1 << folio_order(folio) >= nr_pages))

We have folio_nr_pages().

Do we really need the folio_hugetlb_migratable() check?

> +				return false;

This code is completely racy. folio_nr_pages() should be fine AFAIKT (no 
VM_WARN_ON() etc), not sure about folio_test_hugetlb_migratable().

If it becomes a problem we could do a snapshot_page() to take a snapshot 
we can query.

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] page_alloc: allow migration of smaller hugepages during contig_alloc.
  2025-10-20 17:24 ` David Hildenbrand
@ 2025-10-20 17:41   ` Gregory Price
  2025-10-20 19:15     ` Zi Yan
  0 siblings, 1 reply; 11+ messages in thread
From: Gregory Price @ 2025-10-20 17:41 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, kernel-team, akpm, vbabka, surenb, mhocko,
	jackmanb, hannes, ziy

On Mon, Oct 20, 2025 at 07:24:04PM +0200, David Hildenbrand wrote:
> On 20.10.25 19:06, Gregory Price wrote:
> 
> Do we really need the folio_hugetlb_migratable() check?
> This code is completely racy.

My thought was it's better to check if any *one* folio in the bunch is
non-migratable, it's better to never even call compaction in the first
place.  But you're right, this is racy.

In one race, the compaction code will just fail if this bit gets set
between now and the isolate call in folio_isolate_hugetlb() - resulting
in searching the next block anyway.  So that seemed ok?

In the other race, the bit becomes un-set and we skip a block that might
otherwise be valid.

I can drop this check, it's just an optimistic optimization anyway.

I should also probably check CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION here
regardless, since we should skip compaction if migration isn't possible.

>> folio_nr_pages() should be fine AFAIKT (no
> VM_WARN_ON() etc), not sure about folio_test_hugetlb_migratable().

will change, and will check/change based on above thoughts.

~Gregory

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] page_alloc: allow migration of smaller hugepages during contig_alloc.
  2025-10-20 17:41   ` Gregory Price
@ 2025-10-20 19:15     ` Zi Yan
  2025-10-20 19:18       ` David Hildenbrand
  0 siblings, 1 reply; 11+ messages in thread
From: Zi Yan @ 2025-10-20 19:15 UTC (permalink / raw)
  To: Gregory Price, David Hildenbrand
  Cc: linux-mm, linux-kernel, kernel-team, akpm, vbabka, surenb, mhocko,
	jackmanb, hannes

On 20 Oct 2025, at 13:41, Gregory Price wrote:

> On Mon, Oct 20, 2025 at 07:24:04PM +0200, David Hildenbrand wrote:
>> On 20.10.25 19:06, Gregory Price wrote:
>>
>> Do we really need the folio_hugetlb_migratable() check?
>> This code is completely racy.
>
> My thought was it's better to check if any *one* folio in the bunch is
> non-migratable, it's better to never even call compaction in the first
> place.  But you're right, this is racy.
>
> In one race, the compaction code will just fail if this bit gets set
> between now and the isolate call in folio_isolate_hugetlb() - resulting
> in searching the next block anyway.  So that seemed ok?
>
> In the other race, the bit becomes un-set and we skip a block that might
> otherwise be valid.
>
> I can drop this check, it's just an optimistic optimization anyway.
>
> I should also probably check CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION here
> regardless, since we should skip compaction if migration isn't possible.
>
>>> folio_nr_pages() should be fine AFAIKT (no
>> VM_WARN_ON() etc), not sure about folio_test_hugetlb_migratable().
>
> will change, and will check/change based on above thoughts.

If it is racy, could folio_order() or folio_nr_pages() return a bogusly
large and cause a wrong result?

In isolate_migratepages_block(), compound_order(page) is used and checked
against MAX_PAGE_ORDER to avoid a bogus page order. I wonder if we should
use the same pattern here.

Basically, what is the right way of checking a folio order without lock?
Should we have a standardized helper function for that?

--
Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] page_alloc: allow migration of smaller hugepages during contig_alloc.
  2025-10-20 19:15     ` Zi Yan
@ 2025-10-20 19:18       ` David Hildenbrand
  2025-10-20 19:40         ` Gregory Price
  0 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2025-10-20 19:18 UTC (permalink / raw)
  To: Zi Yan, Gregory Price
  Cc: linux-mm, linux-kernel, kernel-team, akpm, vbabka, surenb, mhocko,
	jackmanb, hannes

On 20.10.25 21:15, Zi Yan wrote:
> On 20 Oct 2025, at 13:41, Gregory Price wrote:
> 
>> On Mon, Oct 20, 2025 at 07:24:04PM +0200, David Hildenbrand wrote:
>>> On 20.10.25 19:06, Gregory Price wrote:
>>>
>>> Do we really need the folio_hugetlb_migratable() check?
>>> This code is completely racy.
>>
>> My thought was it's better to check if any *one* folio in the bunch is
>> non-migratable, it's better to never even call compaction in the first
>> place.  But you're right, this is racy.
>>
>> In one race, the compaction code will just fail if this bit gets set
>> between now and the isolate call in folio_isolate_hugetlb() - resulting
>> in searching the next block anyway.  So that seemed ok?
>>
>> In the other race, the bit becomes un-set and we skip a block that might
>> otherwise be valid.
>>
>> I can drop this check, it's just an optimistic optimization anyway.
>>
>> I should also probably check CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION here
>> regardless, since we should skip compaction if migration isn't possible.
>>
>>>> folio_nr_pages() should be fine AFAIKT (no
>>> VM_WARN_ON() etc), not sure about folio_test_hugetlb_migratable().
>>
>> will change, and will check/change based on above thoughts.
> 
> If it is racy, could folio_order() or folio_nr_pages() return a bogusly
> large and cause a wrong result?
> 
> In isolate_migratepages_block(), compound_order(page) is used and checked
> against MAX_PAGE_ORDER to avoid a bogus page order. I wonder if we should
> use the same pattern here.
> 
> Basically, what is the right way of checking a folio order without lock?
> Should we have a standardized helper function for that?

As raised, snapshot_page() tries to stabilize the folio best it can.

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] page_alloc: allow migration of smaller hugepages during contig_alloc.
  2025-10-20 19:18       ` David Hildenbrand
@ 2025-10-20 19:40         ` Gregory Price
  2025-10-20 19:46           ` David Hildenbrand
  0 siblings, 1 reply; 11+ messages in thread
From: Gregory Price @ 2025-10-20 19:40 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Zi Yan, linux-mm, linux-kernel, kernel-team, akpm, vbabka, surenb,
	mhocko, jackmanb, hannes

On Mon, Oct 20, 2025 at 09:18:36PM +0200, David Hildenbrand wrote:
> > 
> > Basically, what is the right way of checking a folio order without lock?
> > Should we have a standardized helper function for that?
> 
> As raised, snapshot_page() tries to stabilize the folio best it can.

is snapshot_page() even worth it if we're already racing on flag checks?

i.e. there's already a race condition between

	pfn_range_valid_contig(range) -> compaction(range)

on some bogus value the worst that happens is either compaction gets
called when it can't compact, or compaction doesn't get called when it
could compact - either way, compaction still handles it if called.

We may just skip some blocks (which is still better than now).

--

on compound_order - from mm.h:

/*
 * compound_order() can be called without holding a reference, which means
 * that niceties like page_folio() don't work.  These callers should be
 * prepared to handle wild return values.  For example, PG_head may be
 * set before the order is initialised, or this may be a tail page.
 * See compaction.c for some good examples.
 */

Seems like the correct interface given the scenario. I'll poke around.

~Gregory

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] page_alloc: allow migration of smaller hugepages during contig_alloc.
  2025-10-20 19:40         ` Gregory Price
@ 2025-10-20 19:46           ` David Hildenbrand
  2025-10-20 19:58             ` Gregory Price
  0 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2025-10-20 19:46 UTC (permalink / raw)
  To: Gregory Price
  Cc: Zi Yan, linux-mm, linux-kernel, kernel-team, akpm, vbabka, surenb,
	mhocko, jackmanb, hannes

On 20.10.25 21:40, Gregory Price wrote:
> On Mon, Oct 20, 2025 at 09:18:36PM +0200, David Hildenbrand wrote:
>>>
>>> Basically, what is the right way of checking a folio order without lock?
>>> Should we have a standardized helper function for that?
>>
>> As raised, snapshot_page() tries to stabilize the folio best it can.
> 
> is snapshot_page() even worth it if we're already racing on flag checks?

I think it tries to handle what compound_order() cannot easily handle, 
as it will retry in case it detects an obvious race.

> 
> i.e. there's already a race condition between
> 
> 	pfn_range_valid_contig(range) -> compaction(range)

Can you elaborate how compaction comes into play here? I'm missing the 
interaction.

pfn_range_valid_contig() should be only called by alloc_contig_pages() 
and not out of compaction context?

> 
> on some bogus value the worst that happens is either compaction gets
> called when it can't compact, or compaction doesn't get called when it
> could compact - either way, compaction still handles it if called.
> 
> We may just skip some blocks (which is still better than now).
> 
> --
> 
> on compound_order - from mm.h:
> 
> /*
>   * compound_order() can be called without holding a reference, which means
>   * that niceties like page_folio() don't work.  These callers should be
>   * prepared to handle wild return values.  For example, PG_head may be
>   * set before the order is initialised, or this may be a tail page.
>   * See compaction.c for some good examples.
>   */
> 
> Seems like the correct interface given the scenario. I'll poke around.

Yes, avoiding folios altogether is even better. As documented, it can 
result in crazy values due to races that must be handled (like 
compaction, yes).

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] page_alloc: allow migration of smaller hugepages during contig_alloc.
  2025-10-20 19:46           ` David Hildenbrand
@ 2025-10-20 19:58             ` Gregory Price
  2025-10-20 20:17               ` David Hildenbrand
  0 siblings, 1 reply; 11+ messages in thread
From: Gregory Price @ 2025-10-20 19:58 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Zi Yan, linux-mm, linux-kernel, kernel-team, akpm, vbabka, surenb,
	mhocko, jackmanb, hannes

On Mon, Oct 20, 2025 at 09:46:21PM +0200, David Hildenbrand wrote:
> On 20.10.25 21:40, Gregory Price wrote:
> > On Mon, Oct 20, 2025 at 09:18:36PM +0200, David Hildenbrand wrote:
> > > > 
> > > > Basically, what is the right way of checking a folio order without lock?
> > > > Should we have a standardized helper function for that?
> > > 
> > > As raised, snapshot_page() tries to stabilize the folio best it can.
> > 
> > is snapshot_page() even worth it if we're already racing on flag checks?
> 
> I think it tries to handle what compound_order() cannot easily handle, as it
> will retry in case it detects an obvious race.
> 
> > 
> > i.e. there's already a race condition between
> > 
> > 	pfn_range_valid_contig(range) -> compaction(range)
> 
> Can you elaborate how compaction comes into play here? I'm missing the
> interaction.
> 
> pfn_range_valid_contig() should be only called by alloc_contig_pages() and
> not out of compaction context?
> 

I've been digging through the code a bit, so a quick shot from my notes

alloc_contig_pages_noprof
  if (pfn_range_valid_contig(range))        <- check validity
    __alloc_contig_pages(range)
      alloc_contig_range_noprof(range)
        start_isolate_page_range(range)     <- isolate
        __alloc_contig_migrate_range(range)
          isolate_migratepages_range(range) <- compact

Seems like all the checks done in pfn_range_valid_contig() already race
with everything after it anyway since references aren't held?  Any of
those pages could be freed (get bogus values), but i suppose not
allocated (given the zone lock is held)?

> > Seems like the correct interface given the scenario. I'll poke around.
> 
> Yes, avoiding folios altogether is even better. As documented, it can result
> in crazy values due to races that must be handled (like compaction, yes).
> 

i'll make this swap then.

~Gregory


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] page_alloc: allow migration of smaller hugepages during contig_alloc.
  2025-10-20 19:58             ` Gregory Price
@ 2025-10-20 20:17               ` David Hildenbrand
  2025-10-20 20:27                 ` Gregory Price
  0 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2025-10-20 20:17 UTC (permalink / raw)
  To: Gregory Price
  Cc: Zi Yan, linux-mm, linux-kernel, kernel-team, akpm, vbabka, surenb,
	mhocko, jackmanb, hannes

On 20.10.25 21:58, Gregory Price wrote:
> On Mon, Oct 20, 2025 at 09:46:21PM +0200, David Hildenbrand wrote:
>> On 20.10.25 21:40, Gregory Price wrote:
>>> On Mon, Oct 20, 2025 at 09:18:36PM +0200, David Hildenbrand wrote:
>>>>>
>>>>> Basically, what is the right way of checking a folio order without lock?
>>>>> Should we have a standardized helper function for that?
>>>>
>>>> As raised, snapshot_page() tries to stabilize the folio best it can.
>>>
>>> is snapshot_page() even worth it if we're already racing on flag checks?
>>
>> I think it tries to handle what compound_order() cannot easily handle, as it
>> will retry in case it detects an obvious race.
>>
>>>
>>> i.e. there's already a race condition between
>>>
>>> 	pfn_range_valid_contig(range) -> compaction(range)
>>
>> Can you elaborate how compaction comes into play here? I'm missing the
>> interaction.
>>
>> pfn_range_valid_contig() should be only called by alloc_contig_pages() and
>> not out of compaction context?
>>
> 
> I've been digging through the code a bit, so a quick shot from my notes
> 
> alloc_contig_pages_noprof
>    if (pfn_range_valid_contig(range))        <- check validity
>      __alloc_contig_pages(range)
>        alloc_contig_range_noprof(range)
>          start_isolate_page_range(range)     <- isolate
>          __alloc_contig_migrate_range(range)
>            isolate_migratepages_range(range) <- compact

Oh, that's what you mean with "compact", it's just isolation+migration.

> 
> Seems like all the checks done in pfn_range_valid_contig() already race
> with everything after it anyway since references aren't held?  Any of
> those pages could be freed (get bogus values), but i suppose not
> allocated (given the zone lock is held)?

Yes, it's completely racy.

I was primarily concerned about us calling functions that will 
VM_WARN_ON() etc due to the races; not that they would make us 
accept/jump over a range although we shouldn't.

Of course, regarding the latter, we want to try as good as possible to 
avoid jumping over ranges that we can actually handle.

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] page_alloc: allow migration of smaller hugepages during contig_alloc.
  2025-10-20 20:17               ` David Hildenbrand
@ 2025-10-20 20:27                 ` Gregory Price
  2025-10-20 20:38                   ` David Hildenbrand
  0 siblings, 1 reply; 11+ messages in thread
From: Gregory Price @ 2025-10-20 20:27 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Zi Yan, linux-mm, linux-kernel, kernel-team, akpm, vbabka, surenb,
	mhocko, jackmanb, hannes

On Mon, Oct 20, 2025 at 10:17:42PM +0200, David Hildenbrand wrote:
> Yes, it's completely racy.
> 
> I was primarily concerned about us calling functions that will VM_WARN_ON()
> etc due to the races; not that they would make us accept/jump over a range
> although we shouldn't.
> 
> Of course, regarding the latter, we want to try as good as possible to avoid
> jumping over ranges that we can actually handle.
>

I'll go ahead and add a snapshot_page.

~Gregory


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] page_alloc: allow migration of smaller hugepages during contig_alloc.
  2025-10-20 20:27                 ` Gregory Price
@ 2025-10-20 20:38                   ` David Hildenbrand
  0 siblings, 0 replies; 11+ messages in thread
From: David Hildenbrand @ 2025-10-20 20:38 UTC (permalink / raw)
  To: Gregory Price
  Cc: Zi Yan, linux-mm, linux-kernel, kernel-team, akpm, vbabka, surenb,
	mhocko, jackmanb, hannes

On 20.10.25 22:27, Gregory Price wrote:
> On Mon, Oct 20, 2025 at 10:17:42PM +0200, David Hildenbrand wrote:
>> Yes, it's completely racy.
>>
>> I was primarily concerned about us calling functions that will VM_WARN_ON()
>> etc due to the races; not that they would make us accept/jump over a range
>> although we shouldn't.
>>
>> Of course, regarding the latter, we want to try as good as possible to avoid
>> jumping over ranges that we can actually handle.
>>
> 
> I'll go ahead and add a snapshot_page.

I'd say it's probably good enough initially to just use compound_head() 
and filter out crazy values.

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-10-20 20:38 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-20 17:06 [RFC PATCH] page_alloc: allow migration of smaller hugepages during contig_alloc Gregory Price
2025-10-20 17:24 ` David Hildenbrand
2025-10-20 17:41   ` Gregory Price
2025-10-20 19:15     ` Zi Yan
2025-10-20 19:18       ` David Hildenbrand
2025-10-20 19:40         ` Gregory Price
2025-10-20 19:46           ` David Hildenbrand
2025-10-20 19:58             ` Gregory Price
2025-10-20 20:17               ` David Hildenbrand
2025-10-20 20:27                 ` Gregory Price
2025-10-20 20:38                   ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).