[PATCH] mm: page_isolation: Avoid hugepage scan step underflow

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] mm: page_isolation: Avoid hugepage scan step underflow
@ 2026-05-19 12:16 Kaitao Cheng
  2026-05-19 17:54 ` Andrew Morton
  2026-05-20  8:51 ` David Hildenbrand (Arm)
  0 siblings, 2 replies; 7+ messages in thread
From: Kaitao Cheng @ 2026-05-19 12:16 UTC (permalink / raw)
  To: akpm, vbabka, surenb, mhocko, jackmanb, hannes, ziy
  Cc: liushixin2, david, osalvador, linux-mm, linux-kernel,
	Kaitao Cheng

From: Kaitao Cheng <chengkaitao@kylinos.cn>

page_is_unmovable() checks HugeTLB pages without holding hugetlb_lock and
without pinning the folio. This is intentional for the pageblock scanning
paths, but it means the HugeTLB folio can be freed concurrently after
PageHuge() or folio_test_hugetlb() succeeds.

The existing code avoids folio_hstate() and uses size_to_hstate() because
the HugeTLB flag may already have been cleared. However, if
size_to_hstate() returns NULL, the code still falls through and computes
the scan step from folio_nr_pages(). If the folio has been freed and the
head/large state has been cleared, folio_nr_pages() can return 1. When the
current page is a tail page, subtracting folio_page_idx() from 1 can
underflow and make the scanner skip too far.

Treat a NULL hstate as unmovable so the scanner does not try to skip over
an unstable HugeTLB folio. Once a valid hstate is found, derive the number
of pages from the hstate instead of reading the folio size again. Also
validate the page index before computing the step to avoid underflow if the
page/folio relationship changed concurrently.

Fixes: a0a9f2180b90 ("mm: page_isolation: avoid calling folio_hstate() without hugetlb_lock")
Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
---
 mm/page_isolation.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index c48ff5c00244..99f0b06efaf6 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -43,6 +43,7 @@ bool page_is_unmovable(struct zone *zone, struct page *page,
 	 */
 	if (PageHuge(page) || PageCompound(page)) {
 		struct folio *folio = page_folio(page);
+		unsigned long idx, nr_pages;

 		if (folio_test_hugetlb(folio)) {
 			struct hstate *h;
@@ -55,14 +56,21 @@ bool page_is_unmovable(struct zone *zone, struct page *page,
 			 * use folio_hstate() directly.
 			 */
 			h = size_to_hstate(folio_size(folio));
-			if (h && !hugepage_migration_supported(h))
+			if (!h || !hugepage_migration_supported(h))
 				return true;

+			nr_pages = pages_per_huge_page(h);
 		} else if (!folio_test_lru(folio)) {
 			return true;
+		} else {
+			nr_pages = folio_nr_pages(folio);
 		}

-		*step = folio_nr_pages(folio) - folio_page_idx(folio, page);
+		idx = folio_page_idx(folio, page);
+		if (idx >= nr_pages)
+			return true;
+
+		*step = nr_pages - idx;
 		return false;
 	}

-- 
2.50.1 (Apple Git-155)

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: page_isolation: Avoid hugepage scan step underflow
  2026-05-19 12:16 [PATCH] mm: page_isolation: Avoid hugepage scan step underflow Kaitao Cheng
@ 2026-05-19 17:54 ` Andrew Morton
  2026-05-22  9:35   ` Kaitao Cheng
  2026-05-20  8:51 ` David Hildenbrand (Arm)
  1 sibling, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2026-05-19 17:54 UTC (permalink / raw)
  To: Kaitao Cheng
  Cc: vbabka, surenb, mhocko, jackmanb, hannes, ziy, liushixin2, david,
	osalvador, linux-mm, linux-kernel, Kaitao Cheng

On Tue, 19 May 2026 20:16:46 +0800 Kaitao Cheng <kaitao.cheng@linux.dev> wrote:

> page_is_unmovable() checks HugeTLB pages without holding hugetlb_lock and
> without pinning the folio. This is intentional for the pageblock scanning
> paths, but it means the HugeTLB folio can be freed concurrently after
> PageHuge() or folio_test_hugetlb() succeeds.

Thanks.  What are the userspace-visible runtime effects of this?

> The existing code avoids folio_hstate() and uses size_to_hstate() because
> the HugeTLB flag may already have been cleared. However, if
> size_to_hstate() returns NULL, the code still falls through and computes
> the scan step from folio_nr_pages(). If the folio has been freed and the
> head/large state has been cleared, folio_nr_pages() can return 1. When the
> current page is a tail page, subtracting folio_page_idx() from 1 can
> underflow and make the scanner skip too far.
> 
> Treat a NULL hstate as unmovable so the scanner does not try to skip over
> an unstable HugeTLB folio. Once a valid hstate is found, derive the number
> of pages from the hstate instead of reading the folio size again. Also
> validate the page index before computing the step to avoid underflow if the
> page/folio relationship changed concurrently.

This code sounds rather sketchy, and it sounds like it will remain
sketchy after the patch.  And AI review says "hey, this code is
sketchy":
https://sashiko.dev/#/patchset/20260519121646.40833-1-kaitao.cheng@linux.dev

> Fixes: a0a9f2180b90 ("mm: page_isolation: avoid calling folio_hstate() without hugetlb_lock")

We might want to cc:stable on this, depends on the answer to my above
question.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: page_isolation: Avoid hugepage scan step underflow
  2026-05-19 12:16 [PATCH] mm: page_isolation: Avoid hugepage scan step underflow Kaitao Cheng
  2026-05-19 17:54 ` Andrew Morton
@ 2026-05-20  8:51 ` David Hildenbrand (Arm)
  1 sibling, 0 replies; 7+ messages in thread
From: David Hildenbrand (Arm) @ 2026-05-20  8:51 UTC (permalink / raw)
  To: Kaitao Cheng, akpm, vbabka, surenb, mhocko, jackmanb, hannes, ziy
  Cc: liushixin2, osalvador, linux-mm, linux-kernel, Kaitao Cheng

On 5/19/26 14:16, Kaitao Cheng wrote:
> From: Kaitao Cheng <chengkaitao@kylinos.cn>
> 
> page_is_unmovable() checks HugeTLB pages without holding hugetlb_lock and
> without pinning the folio. This is intentional for the pageblock scanning
> paths, but it means the HugeTLB folio can be freed concurrently after
> PageHuge() or folio_test_hugetlb() succeeds.
> 
> The existing code avoids folio_hstate() and uses size_to_hstate() because
> the HugeTLB flag may already have been cleared. However, if
> size_to_hstate() returns NULL, the code still falls through and computes
> the scan step from folio_nr_pages(). If the folio has been freed and the
> head/large state has been cleared, folio_nr_pages() can return 1. When the
> current page is a tail page, subtracting folio_page_idx() from 1 can
> underflow and make the scanner skip too far.
> 
> Treat a NULL hstate as unmovable so the scanner does not try to skip over
> an unstable HugeTLB folio. Once a valid hstate is found, derive the number
> of pages from the hstate instead of reading the folio size again. Also
> validate the page index before computing the step to avoid underflow if the
> page/folio relationship changed concurrently.
> 
> Fixes: a0a9f2180b90 ("mm: page_isolation: avoid calling folio_hstate() without hugetlb_lock")
> Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
> ---
>  mm/page_isolation.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index c48ff5c00244..99f0b06efaf6 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -43,6 +43,7 @@ bool page_is_unmovable(struct zone *zone, struct page *page,
>  	 */
>  	if (PageHuge(page) || PageCompound(page)) {
>  		struct folio *folio = page_folio(page);
> +		unsigned long idx, nr_pages;
>  
>  		if (folio_test_hugetlb(folio)) {
>  			struct hstate *h;
> @@ -55,14 +56,21 @@ bool page_is_unmovable(struct zone *zone, struct page *page,
>  			 * use folio_hstate() directly.
>  			 */
>  			h = size_to_hstate(folio_size(folio));
> -			if (h && !hugepage_migration_supported(h))
> +			if (!h || !hugepage_migration_supported(h))
>  				return true;
>  
> +			nr_pages = pages_per_huge_page(h);
>  		} else if (!folio_test_lru(folio)) {
>  			return true;
> +		} else {
> +			nr_pages = folio_nr_pages(folio);
>  		}
>  
> -		*step = folio_nr_pages(folio) - folio_page_idx(folio, page);
> +		idx = folio_page_idx(folio, page);
> +		if (idx >= nr_pages)
> +			return true;
> +
> +		*step = nr_pages - idx;
>  		return false;
>  	}
>  

We have very similar code in scan_movable_pages() that we previously fixed.

Seems like we can avoid folio_page_idx() completely by just doing "pfn |=
nr_pages - 1".

If, in corner cases, we skip over some unmovable pages, too bad. The function is
inherently racy.

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: page_isolation: Avoid hugepage scan step underflow
  2026-05-19 17:54 ` Andrew Morton
@ 2026-05-22  9:35   ` Kaitao Cheng
  2026-06-01 16:21     ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 7+ messages in thread
From: Kaitao Cheng @ 2026-05-22  9:35 UTC (permalink / raw)
  To: Andrew Morton, david
  Cc: vbabka, surenb, mhocko, jackmanb, hannes, ziy, liushixin2,
	osalvador, linux-mm, linux-kernel, Kaitao Cheng, Muchun Song

在 2026/5/20 01:54, Andrew Morton 写道:
> On Tue, 19 May 2026 20:16:46 +0800 Kaitao Cheng <kaitao.cheng@linux.dev> wrote:
> 
>> page_is_unmovable() checks HugeTLB pages without holding hugetlb_lock and
>> without pinning the folio. This is intentional for the pageblock scanning
>> paths, but it means the HugeTLB folio can be freed concurrently after
>> PageHuge() or folio_test_hugetlb() succeeds.
> 
> Thanks.  What are the userspace-visible runtime effects of this?

The direct user-visible effect is not memory corruption, but degraded forward
progress in the page isolation / contiguous allocation path.

If the race makes folio_nr_pages() return 1 while the current page is still
treated as a tail page of the old HugeTLB folio, the computed step can
underflow:

        step = folio_nr_pages(folio) - folio_page_idx(folio, page);

The caller then does:

        start_pfn += step;

With unsigned arithmetic this can wrap and move start_pfn backwards, typically
near the beginning of the old hugepage range rather than advancing past it. In
many cases this only means rescanning part of the same hugepage range, so the
effect may be limited to extra scanning work.

However, it still violates the scanner's forward-progress assumption: step
is expected to advance start_pfn. If the same transient state is observed
repeatedly, the scanner can keep revisiting the same PFNs, causing excessive
latency and, in the worst case, an apparent stall in operations that rely on
page isolation or contiguous allocation.


Here is another point raised by AI, the old code also used folio_test_lru()
on a folio pointer obtained without holding a reference. If the folio is
freed and the old head page is reused or observed as a tail page of another
compound page, folio_test_lru() can reach const_folio_flags(), which asserts
that the passed folio is not a tail page. On DEBUG_VM kernels, that can
trigger a VM_BUG_ON_PGFLAGS() crash.

>> The existing code avoids folio_hstate() and uses size_to_hstate() because
>> the HugeTLB flag may already have been cleared. However, if
>> size_to_hstate() returns NULL, the code still falls through and computes
>> the scan step from folio_nr_pages(). If the folio has been freed and the
>> head/large state has been cleared, folio_nr_pages() can return 1. When the
>> current page is a tail page, subtracting folio_page_idx() from 1 can
>> underflow and make the scanner skip too far.
>>
>> Treat a NULL hstate as unmovable so the scanner does not try to skip over
>> an unstable HugeTLB folio. Once a valid hstate is found, derive the number
>> of pages from the hstate instead of reading the folio size again. Also
>> validate the page index before computing the step to avoid underflow if the
>> page/folio relationship changed concurrently.
> 
> This code sounds rather sketchy, and it sounds like it will remain
> sketchy after the patch.  And AI review says "hey, this code is
> sketchy":
> https://sashiko.dev/#/patchset/20260519121646.40833-1-kaitao.cheng@linux.dev

Following David Hildenbrand's suggestion, I made some changes as shown below.
I'm not sure whether there are still any other issues.

--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -41,8 +41,14 @@ bool page_is_unmovable(struct zone *zone, struct page *page,
         * We need not scan over tail pages because we don't
         * handle each tail page individually in migration.
         */
-       if (PageHuge(page) || PageCompound(page)) {
+       if (PageCompound(page)) {
                struct folio *folio = page_folio(page);
+               unsigned long nr_pages, pfn;
+               unsigned int order;
+
+               order = compound_order(&folio->page);
+               if (order > MAX_FOLIO_ORDER)
+                       return true;

                if (folio_test_hugetlb(folio)) {
                        struct hstate *h;
@@ -54,15 +60,16 @@ bool page_is_unmovable(struct zone *zone, struct page *page,
                         * The huge page may be freed so can not
                         * use folio_hstate() directly.
                         */
-                       h = size_to_hstate(folio_size(folio));
-                       if (h && !hugepage_migration_supported(h))
+                       h = size_to_hstate(PAGE_SIZE << order);
+                       if (!h || !hugepage_migration_supported(h))
                                return true;
-
-               } else if (!folio_test_lru(folio)) {
+               } else if (!PageLRU(page)) {
                        return true;
                }

-               *step = folio_nr_pages(folio) - folio_page_idx(folio, page);
+               nr_pages = 1UL << order;
+               pfn = page_to_pfn(page);
+               *step = (pfn | (nr_pages - 1)) + 1 - pfn;
                return false;
        }

-- 
Thanks
Kaitao Cheng



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: page_isolation: Avoid hugepage scan step underflow
  2026-05-22  9:35   ` Kaitao Cheng
@ 2026-06-01 16:21     ` David Hildenbrand (Arm)
  2026-06-02  7:08       ` Kaitao Cheng
  0 siblings, 1 reply; 7+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-01 16:21 UTC (permalink / raw)
  To: Kaitao Cheng, Andrew Morton
  Cc: vbabka, surenb, mhocko, jackmanb, hannes, ziy, liushixin2,
	osalvador, linux-mm, linux-kernel, Kaitao Cheng, Muchun Song

On 5/22/26 11:35, Kaitao Cheng wrote:
> 在 2026/5/20 01:54, Andrew Morton 写道:
>> On Tue, 19 May 2026 20:16:46 +0800 Kaitao Cheng <kaitao.cheng@linux.dev> wrote:
>>
>>> page_is_unmovable() checks HugeTLB pages without holding hugetlb_lock and
>>> without pinning the folio. This is intentional for the pageblock scanning
>>> paths, but it means the HugeTLB folio can be freed concurrently after
>>> PageHuge() or folio_test_hugetlb() succeeds.
>>
>> Thanks.  What are the userspace-visible runtime effects of this?
> 
> The direct user-visible effect is not memory corruption, but degraded forward
> progress in the page isolation / contiguous allocation path.
> 
> If the race makes folio_nr_pages() return 1 while the current page is still
> treated as a tail page of the old HugeTLB folio, the computed step can
> underflow:
> 
>         step = folio_nr_pages(folio) - folio_page_idx(folio, page);
> 
> The caller then does:
> 
>         start_pfn += step;
> 
> With unsigned arithmetic this can wrap and move start_pfn backwards, typically
> near the beginning of the old hugepage range rather than advancing past it. In
> many cases this only means rescanning part of the same hugepage range, so the
> effect may be limited to extra scanning work.
> 
> However, it still violates the scanner's forward-progress assumption: step
> is expected to advance start_pfn. If the same transient state is observed
> repeatedly, the scanner can keep revisiting the same PFNs, causing excessive
> latency and, in the worst case, an apparent stall in operations that rely on
> page isolation or contiguous allocation.
> 
> 
> Here is another point raised by AI, the old code also used folio_test_lru()
> on a folio pointer obtained without holding a reference. If the folio is
> freed and the old head page is reused or observed as a tail page of another
> compound page, folio_test_lru() can reach const_folio_flags(), which asserts
> that the passed folio is not a tail page. On DEBUG_VM kernels, that can
> trigger a VM_BUG_ON_PGFLAGS() crash.
> 
>>> The existing code avoids folio_hstate() and uses size_to_hstate() because
>>> the HugeTLB flag may already have been cleared. However, if
>>> size_to_hstate() returns NULL, the code still falls through and computes
>>> the scan step from folio_nr_pages(). If the folio has been freed and the
>>> head/large state has been cleared, folio_nr_pages() can return 1. When the
>>> current page is a tail page, subtracting folio_page_idx() from 1 can
>>> underflow and make the scanner skip too far.
>>>
>>> Treat a NULL hstate as unmovable so the scanner does not try to skip over
>>> an unstable HugeTLB folio. Once a valid hstate is found, derive the number
>>> of pages from the hstate instead of reading the folio size again. Also
>>> validate the page index before computing the step to avoid underflow if the
>>> page/folio relationship changed concurrently.
>>
>> This code sounds rather sketchy, and it sounds like it will remain
>> sketchy after the patch.  And AI review says "hey, this code is
>> sketchy":
>> https://sashiko.dev/#/patchset/20260519121646.40833-1-kaitao.cheng@linux.dev
> 
> Following David Hildenbrand's suggestion, I made some changes as shown below.
> I'm not sure whether there are still any other issues.
> 
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -41,8 +41,14 @@ bool page_is_unmovable(struct zone *zone, struct page *page,
>          * We need not scan over tail pages because we don't
>          * handle each tail page individually in migration.
>          */
> -       if (PageHuge(page) || PageCompound(page)) {
> +       if (PageCompound(page)) {
>                 struct folio *folio = page_folio(page);
> +               unsigned long nr_pages, pfn;
> +               unsigned int order;
> +
> +               order = compound_order(&folio->page);
> +               if (order > MAX_FOLIO_ORDER)
> +                       return true;

Would we also have to care about non-order-of-2?

> 
>                 if (folio_test_hugetlb(folio)) {
>                         struct hstate *h;
> @@ -54,15 +60,16 @@ bool page_is_unmovable(struct zone *zone, struct page *page,
>                          * The huge page may be freed so can not
>                          * use folio_hstate() directly.
>                          */
> -                       h = size_to_hstate(folio_size(folio));
> -                       if (h && !hugepage_migration_supported(h))
> +                       h = size_to_hstate(PAGE_SIZE << order);
> +                       if (!h || !hugepage_migration_supported(h))
>                                 return true;
> -
> -               } else if (!folio_test_lru(folio)) {
> +               } else if (!PageLRU(page)) {

Hm, is that required because we could VM_BUG_ON?

>                         return true;
>                 }
> 
> -               *step = folio_nr_pages(folio) - folio_page_idx(folio, page);
> +               nr_pages = 1UL << order;
> +               pfn = page_to_pfn(page);
> +               *step = (pfn | (nr_pages - 1)) + 1 - pfn;
>                 return false;
>         }
> 


-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: page_isolation: Avoid hugepage scan step underflow
  2026-06-01 16:21     ` David Hildenbrand (Arm)
@ 2026-06-02  7:08       ` Kaitao Cheng
  2026-06-02  9:30         ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 7+ messages in thread
From: Kaitao Cheng @ 2026-06-02  7:08 UTC (permalink / raw)
  To: David Hildenbrand (Arm), Andrew Morton
  Cc: vbabka, surenb, mhocko, jackmanb, hannes, ziy, liushixin2,
	osalvador, linux-mm, linux-kernel, Kaitao Cheng, Muchun Song

在 2026/6/2 00:21, David Hildenbrand (Arm) 写道:
> On 5/22/26 11:35, Kaitao Cheng wrote:
>> 在 2026/5/20 01:54, Andrew Morton 写道:
>>> On Tue, 19 May 2026 20:16:46 +0800 Kaitao Cheng <kaitao.cheng@linux.dev> wrote:
>>>
>>>> page_is_unmovable() checks HugeTLB pages without holding hugetlb_lock and
>>>> without pinning the folio. This is intentional for the pageblock scanning
>>>> paths, but it means the HugeTLB folio can be freed concurrently after
>>>> PageHuge() or folio_test_hugetlb() succeeds.
>>>
>>> Thanks.  What are the userspace-visible runtime effects of this?
>>
>> The direct user-visible effect is not memory corruption, but degraded forward
>> progress in the page isolation / contiguous allocation path.
>>
>> If the race makes folio_nr_pages() return 1 while the current page is still
>> treated as a tail page of the old HugeTLB folio, the computed step can
>> underflow:
>>
>>         step = folio_nr_pages(folio) - folio_page_idx(folio, page);
>>
>> The caller then does:
>>
>>         start_pfn += step;
>>
>> With unsigned arithmetic this can wrap and move start_pfn backwards, typically
>> near the beginning of the old hugepage range rather than advancing past it. In
>> many cases this only means rescanning part of the same hugepage range, so the
>> effect may be limited to extra scanning work.
>>
>> However, it still violates the scanner's forward-progress assumption: step
>> is expected to advance start_pfn. If the same transient state is observed
>> repeatedly, the scanner can keep revisiting the same PFNs, causing excessive
>> latency and, in the worst case, an apparent stall in operations that rely on
>> page isolation or contiguous allocation.
>>
>>
>> Here is another point raised by AI, the old code also used folio_test_lru()
>> on a folio pointer obtained without holding a reference. If the folio is
>> freed and the old head page is reused or observed as a tail page of another
>> compound page, folio_test_lru() can reach const_folio_flags(), which asserts
>> that the passed folio is not a tail page. On DEBUG_VM kernels, that can
>> trigger a VM_BUG_ON_PGFLAGS() crash.
>>
>>>> The existing code avoids folio_hstate() and uses size_to_hstate() because
>>>> the HugeTLB flag may already have been cleared. However, if
>>>> size_to_hstate() returns NULL, the code still falls through and computes
>>>> the scan step from folio_nr_pages(). If the folio has been freed and the
>>>> head/large state has been cleared, folio_nr_pages() can return 1. When the
>>>> current page is a tail page, subtracting folio_page_idx() from 1 can
>>>> underflow and make the scanner skip too far.
>>>>
>>>> Treat a NULL hstate as unmovable so the scanner does not try to skip over
>>>> an unstable HugeTLB folio. Once a valid hstate is found, derive the number
>>>> of pages from the hstate instead of reading the folio size again. Also
>>>> validate the page index before computing the step to avoid underflow if the
>>>> page/folio relationship changed concurrently.
>>>
>>> This code sounds rather sketchy, and it sounds like it will remain
>>> sketchy after the patch.  And AI review says "hey, this code is
>>> sketchy":
>>> https://sashiko.dev/#/patchset/20260519121646.40833-1-kaitao.cheng@linux.dev
>>
>> Following David Hildenbrand's suggestion, I made some changes as shown below.
>> I'm not sure whether there are still any other issues.
>>
>> --- a/mm/page_isolation.c
>> +++ b/mm/page_isolation.c
>> @@ -41,8 +41,14 @@ bool page_is_unmovable(struct zone *zone, struct page *page,
>>          * We need not scan over tail pages because we don't
>>          * handle each tail page individually in migration.
>>          */
>> -       if (PageHuge(page) || PageCompound(page)) {
>> +       if (PageCompound(page)) {
>>                 struct folio *folio = page_folio(page);
>> +               unsigned long nr_pages, pfn;
>> +               unsigned int order;
>> +
>> +               order = compound_order(&folio->page);
>> +               if (order > MAX_FOLIO_ORDER)
>> +                       return true;
> 
> Would we also have to care about non-order-of-2?

There should not be any non-order-of-2 compound pages here. Neither the
buddy allocator's allocation model nor the folio_size() / size_to_hstate()
implementation supports non-order-of-two compound pages.
 
>>
>>                 if (folio_test_hugetlb(folio)) {
>>                         struct hstate *h;
>> @@ -54,15 +60,16 @@ bool page_is_unmovable(struct zone *zone, struct page *page,
>>                          * The huge page may be freed so can not
>>                          * use folio_hstate() directly.
>>                          */
>> -                       h = size_to_hstate(folio_size(folio));
>> -                       if (h && !hugepage_migration_supported(h))
>> +                       h = size_to_hstate(PAGE_SIZE << order);
>> +                       if (!h || !hugepage_migration_supported(h))
>>                                 return true;
>> -
>> -               } else if (!folio_test_lru(folio)) {
>> +               } else if (!PageLRU(page)) {
> 
> Hm, is that required because we could VM_BUG_ON?

Yes, that is one of the reasons for the change.

page_is_unmovable() does not hold a reference on the folio, and the folio
can be freed and reused concurrently while this scanner is running. In that
case, the previously obtained folio pointer may no longer refer to a valid
folio head. If the original head page is reused as a tail page of another
compound page, folio_test_lru(folio) can hit the folio flag checks and
potentially trigger a VM_BUG_ON_PGFLAGS().

>>                         return true;
>>                 }
>>
>> -               *step = folio_nr_pages(folio) - folio_page_idx(folio, page);
>> +               nr_pages = 1UL << order;
>> +               pfn = page_to_pfn(page);
>> +               *step = (pfn | (nr_pages - 1)) + 1 - pfn;
>>                 return false;
>>         }
>>
> 
> 

-- 
Thanks
Kaitao Cheng



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: page_isolation: Avoid hugepage scan step underflow
  2026-06-02  7:08       ` Kaitao Cheng
@ 2026-06-02  9:30         ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 7+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-02  9:30 UTC (permalink / raw)
  To: Kaitao Cheng, Andrew Morton
  Cc: vbabka, surenb, mhocko, jackmanb, hannes, ziy, liushixin2,
	osalvador, linux-mm, linux-kernel, Kaitao Cheng, Muchun Song

On 6/2/26 09:08, Kaitao Cheng wrote:
> 在 2026/6/2 00:21, David Hildenbrand (Arm) 写道:
>> On 5/22/26 11:35, Kaitao Cheng wrote:
>>> 在 2026/5/20 01:54, Andrew Morton 写道:
>>>
>>> The direct user-visible effect is not memory corruption, but degraded forward
>>> progress in the page isolation / contiguous allocation path.
>>>
>>> If the race makes folio_nr_pages() return 1 while the current page is still
>>> treated as a tail page of the old HugeTLB folio, the computed step can
>>> underflow:
>>>
>>>         step = folio_nr_pages(folio) - folio_page_idx(folio, page);
>>>
>>> The caller then does:
>>>
>>>         start_pfn += step;
>>>
>>> With unsigned arithmetic this can wrap and move start_pfn backwards, typically
>>> near the beginning of the old hugepage range rather than advancing past it. In
>>> many cases this only means rescanning part of the same hugepage range, so the
>>> effect may be limited to extra scanning work.
>>>
>>> However, it still violates the scanner's forward-progress assumption: step
>>> is expected to advance start_pfn. If the same transient state is observed
>>> repeatedly, the scanner can keep revisiting the same PFNs, causing excessive
>>> latency and, in the worst case, an apparent stall in operations that rely on
>>> page isolation or contiguous allocation.
>>>
>>>
>>> Here is another point raised by AI, the old code also used folio_test_lru()
>>> on a folio pointer obtained without holding a reference. If the folio is
>>> freed and the old head page is reused or observed as a tail page of another
>>> compound page, folio_test_lru() can reach const_folio_flags(), which asserts
>>> that the passed folio is not a tail page. On DEBUG_VM kernels, that can
>>> trigger a VM_BUG_ON_PGFLAGS() crash.
>>>
>>>
>>> Following David Hildenbrand's suggestion, I made some changes as shown below.
>>> I'm not sure whether there are still any other issues.
>>>
>>> --- a/mm/page_isolation.c
>>> +++ b/mm/page_isolation.c
>>> @@ -41,8 +41,14 @@ bool page_is_unmovable(struct zone *zone, struct page *page,
>>>          * We need not scan over tail pages because we don't
>>>          * handle each tail page individually in migration.
>>>          */
>>> -       if (PageHuge(page) || PageCompound(page)) {
>>> +       if (PageCompound(page)) {
>>>                 struct folio *folio = page_folio(page);
>>> +               unsigned long nr_pages, pfn;
>>> +               unsigned int order;
>>> +
>>> +               order = compound_order(&folio->page);
>>> +               if (order > MAX_FOLIO_ORDER)
>>> +                       return true;
>>
>> Would we also have to care about non-order-of-2?
> 
> There should not be any non-order-of-2 compound pages here. Neither the
> buddy allocator's allocation model nor the folio_size() / size_to_hstate()
> implementation supports non-order-of-two compound pages.

Ah, sorry, in scan_movable_pages() we use folio_nr_pages(), and that can give
you on races a non-order-of-2.

So yeah, clearly not required here.

>  
>>>
>>>                 if (folio_test_hugetlb(folio)) {
>>>                         struct hstate *h;
>>> @@ -54,15 +60,16 @@ bool page_is_unmovable(struct zone *zone, struct page *page,
>>>                          * The huge page may be freed so can not
>>>                          * use folio_hstate() directly.
>>>                          */
>>> -                       h = size_to_hstate(folio_size(folio));
>>> -                       if (h && !hugepage_migration_supported(h))
>>> +                       h = size_to_hstate(PAGE_SIZE << order);
>>> +                       if (!h || !hugepage_migration_supported(h))
>>>                                 return true;
>>> -
>>> -               } else if (!folio_test_lru(folio)) {
>>> +               } else if (!PageLRU(page)) {
>>
>> Hm, is that required because we could VM_BUG_ON?
> 
> Yes, that is one of the reasons for the change.
> 
> page_is_unmovable() does not hold a reference on the folio, and the folio
> can be freed and reused concurrently while this scanner is running. In that
> case, the previously obtained folio pointer may no longer refer to a valid
> folio head. If the original head page is reused as a tail page of another
> compound page, folio_test_lru(folio) can hit the folio flag checks and
> potentially trigger a VM_BUG_ON_PGFLAGS().

Okay, spell that out in the patch description.


-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-06-02  9:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-19 12:16 [PATCH] mm: page_isolation: Avoid hugepage scan step underflow Kaitao Cheng
2026-05-19 17:54 ` Andrew Morton
2026-05-22  9:35   ` Kaitao Cheng
2026-06-01 16:21     ` David Hildenbrand (Arm)
2026-06-02  7:08       ` Kaitao Cheng
2026-06-02  9:30         ` David Hildenbrand (Arm)
2026-05-20  8:51 ` David Hildenbrand (Arm)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.