Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] mm/khugepaged: clear MMF_VM_HUGEPAGE on mm_slot_alloc() failure
@ 2026-05-06  1:21 Ye Liu
  2026-05-06  2:05 ` Lance Yang
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Ye Liu @ 2026-05-06  1:21 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes, Xin Hao
  Cc: Ye Liu, Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Andrew Morton,
	linux-mm, linux-kernel

From: Ye Liu <liuye@kylinos.cn>

__khugepaged_enter() sets MMF_VM_HUGEPAGE before allocating the
corresponding mm_slot.  If mm_slot_alloc() fails, the function
returns with the flag set but without inserting the mm into the
khugepaged tracking structures.

This leaves the mm in an inconsistent state: it is marked as
registered (MMF_VM_HUGEPAGE set), but will never be scanned by
khugepaged.  Future attempts to register the mm are skipped since
khugepaged_enter_vma() checks the flag and returns early.

Fix this by clearing MMF_VM_HUGEPAGE when mm_slot_alloc() fails,
restoring the ability to retry registration later.

Fixes: 16618670276a ("mm: khugepaged: avoid pointless allocation for struct mm_slot")
Signed-off-by: Ye Liu <liuye@kylinos.cn>
---
Changes since v1:
- Add Fixes tag as suggested by Dev Jain and Lance Yang

 mm/khugepaged.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 7d48d4fbd5f3..60ab7c1b61dd 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -559,8 +559,10 @@ void __khugepaged_enter(struct mm_struct *mm)
 		return;
 
 	slot = mm_slot_alloc(mm_slot_cache);
-	if (!slot)
+	if (!slot) {
+		mm_flags_clear(MMF_VM_HUGEPAGE, mm);
 		return;
+	}
 
 	spin_lock(&khugepaged_mm_lock);
 	mm_slot_insert(mm_slots_hash, mm, slot);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm/khugepaged: clear MMF_VM_HUGEPAGE on mm_slot_alloc() failure
  2026-05-06  1:21 [PATCH v2] mm/khugepaged: clear MMF_VM_HUGEPAGE on mm_slot_alloc() failure Ye Liu
@ 2026-05-06  2:05 ` Lance Yang
  2026-05-06  5:17 ` Baolin Wang
  2026-05-06  6:46 ` Dev Jain
  2 siblings, 0 replies; 10+ messages in thread
From: Lance Yang @ 2026-05-06  2:05 UTC (permalink / raw)
  To: ye.liu
  Cc: akpm, david, ljs, xhao, liuye, ziy, baolin.wang, liam, npache,
	ryan.roberts, dev.jain, baohua, lance.yang, akpm, linux-mm,
	linux-kernel


On Wed, May 06, 2026 at 09:21:30AM +0800, Ye Liu wrote:
>From: Ye Liu <liuye@kylinos.cn>
>
>__khugepaged_enter() sets MMF_VM_HUGEPAGE before allocating the
>corresponding mm_slot.  If mm_slot_alloc() fails, the function
>returns with the flag set but without inserting the mm into the
>khugepaged tracking structures.
>
>This leaves the mm in an inconsistent state: it is marked as
>registered (MMF_VM_HUGEPAGE set), but will never be scanned by
>khugepaged.  Future attempts to register the mm are skipped since
>khugepaged_enter_vma() checks the flag and returns early.
>
>Fix this by clearing MMF_VM_HUGEPAGE when mm_slot_alloc() fails,
>restoring the ability to retry registration later.
>
>Fixes: 16618670276a ("mm: khugepaged: avoid pointless allocation for struct mm_slot")

Nit: the title does not exactly match the original commit subject.

It should be

Fixes: 16618670276a ("mm: khugepaged: avoid pointless allocation for "struct mm_slot"")

which includes quotes around "struct mm_slot".

No need to resend; I think Andrew can fix it up when applying :)

>Signed-off-by: Ye Liu <liuye@kylinos.cn>
>---

Thanks, LGTM.
Reviewed-by: Lance Yang <lance.yang@linux.dev>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm/khugepaged: clear MMF_VM_HUGEPAGE on mm_slot_alloc() failure
  2026-05-06  1:21 [PATCH v2] mm/khugepaged: clear MMF_VM_HUGEPAGE on mm_slot_alloc() failure Ye Liu
  2026-05-06  2:05 ` Lance Yang
@ 2026-05-06  5:17 ` Baolin Wang
  2026-05-06  6:46 ` Dev Jain
  2 siblings, 0 replies; 10+ messages in thread
From: Baolin Wang @ 2026-05-06  5:17 UTC (permalink / raw)
  To: Ye Liu, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Xin Hao
  Cc: Ye Liu, Zi Yan, Liam R. Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Andrew Morton, linux-mm,
	linux-kernel



On 5/6/26 9:21 AM, Ye Liu wrote:
> From: Ye Liu <liuye@kylinos.cn>
> 
> __khugepaged_enter() sets MMF_VM_HUGEPAGE before allocating the
> corresponding mm_slot.  If mm_slot_alloc() fails, the function
> returns with the flag set but without inserting the mm into the
> khugepaged tracking structures.
> 
> This leaves the mm in an inconsistent state: it is marked as
> registered (MMF_VM_HUGEPAGE set), but will never be scanned by
> khugepaged.  Future attempts to register the mm are skipped since
> khugepaged_enter_vma() checks the flag and returns early.
> 
> Fix this by clearing MMF_VM_HUGEPAGE when mm_slot_alloc() fails,
> restoring the ability to retry registration later.
> 
> Fixes: 16618670276a ("mm: khugepaged: avoid pointless allocation for struct mm_slot")
> Signed-off-by: Ye Liu <liuye@kylinos.cn>
> ---

Good catch. Feel free to add:
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm/khugepaged: clear MMF_VM_HUGEPAGE on mm_slot_alloc() failure
  2026-05-06  1:21 [PATCH v2] mm/khugepaged: clear MMF_VM_HUGEPAGE on mm_slot_alloc() failure Ye Liu
  2026-05-06  2:05 ` Lance Yang
  2026-05-06  5:17 ` Baolin Wang
@ 2026-05-06  6:46 ` Dev Jain
  2026-05-06 10:51   ` Lance Yang
  2 siblings, 1 reply; 10+ messages in thread
From: Dev Jain @ 2026-05-06  6:46 UTC (permalink / raw)
  To: Ye Liu, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Xin Hao
  Cc: Ye Liu, Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Barry Song, Lance Yang, Andrew Morton, linux-mm,
	linux-kernel



On 06/05/26 6:51 am, Ye Liu wrote:
> From: Ye Liu <liuye@kylinos.cn>
> 
> __khugepaged_enter() sets MMF_VM_HUGEPAGE before allocating the
> corresponding mm_slot.  If mm_slot_alloc() fails, the function
> returns with the flag set but without inserting the mm into the
> khugepaged tracking structures.
> 
> This leaves the mm in an inconsistent state: it is marked as
> registered (MMF_VM_HUGEPAGE set), but will never be scanned by
> khugepaged.  Future attempts to register the mm are skipped since
> khugepaged_enter_vma() checks the flag and returns early.
> 
> Fix this by clearing MMF_VM_HUGEPAGE when mm_slot_alloc() fails,
> restoring the ability to retry registration later.
> 
> Fixes: 16618670276a ("mm: khugepaged: avoid pointless allocation for struct mm_slot")
> Signed-off-by: Ye Liu <liuye@kylinos.cn>
> ---
> Changes since v1:
> - Add Fixes tag as suggested by Dev Jain and Lance Yang
> 
>  mm/khugepaged.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 7d48d4fbd5f3..60ab7c1b61dd 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -559,8 +559,10 @@ void __khugepaged_enter(struct mm_struct *mm)
>  		return;
>  
>  	slot = mm_slot_alloc(mm_slot_cache);
> -	if (!slot)
> +	if (!slot) {
> +		mm_flags_clear(MMF_VM_HUGEPAGE, mm);
>  		return;
> +	}

Note that, a racing khugepaged_enter_vma() may back off
when it sees that MMF_VM_HUGEPAGE is set, but then the above
clears the flag after slot alloc failure. So we end up not
registering the mm with khugepaged. But I am sure no one
cares, we are in much big trouble if slot alloc is failing.

Although one could argue the same about this patch, I will
still say it is important to fix "flag is set but not registered
with khugepaged" because that just feels wrong.

Reviewed-by: Dev Jain <dev.jain@arm.com>


>  
>  	spin_lock(&khugepaged_mm_lock);
>  	mm_slot_insert(mm_slots_hash, mm, slot);



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm/khugepaged: clear MMF_VM_HUGEPAGE on mm_slot_alloc() failure
  2026-05-06  6:46 ` Dev Jain
@ 2026-05-06 10:51   ` Lance Yang
  2026-05-08 21:41     ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 10+ messages in thread
From: Lance Yang @ 2026-05-06 10:51 UTC (permalink / raw)
  To: dev.jain
  Cc: ye.liu, akpm, david, ljs, xhao, liuye, ziy, baolin.wang, liam,
	npache, ryan.roberts, baohua, lance.yang, akpm, linux-mm,
	linux-kernel


On Wed, May 06, 2026 at 12:16:35PM +0530, Dev Jain wrote:
>
>
>On 06/05/26 6:51 am, Ye Liu wrote:
>> From: Ye Liu <liuye@kylinos.cn>
>> 
>> __khugepaged_enter() sets MMF_VM_HUGEPAGE before allocating the
>> corresponding mm_slot.  If mm_slot_alloc() fails, the function
>> returns with the flag set but without inserting the mm into the
>> khugepaged tracking structures.
>> 
>> This leaves the mm in an inconsistent state: it is marked as
>> registered (MMF_VM_HUGEPAGE set), but will never be scanned by
>> khugepaged.  Future attempts to register the mm are skipped since
>> khugepaged_enter_vma() checks the flag and returns early.
>> 
>> Fix this by clearing MMF_VM_HUGEPAGE when mm_slot_alloc() fails,
>> restoring the ability to retry registration later.
>> 
>> Fixes: 16618670276a ("mm: khugepaged: avoid pointless allocation for struct mm_slot")
>> Signed-off-by: Ye Liu <liuye@kylinos.cn>
>> ---
>> Changes since v1:
>> - Add Fixes tag as suggested by Dev Jain and Lance Yang
>> 
>>  mm/khugepaged.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>> 
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index 7d48d4fbd5f3..60ab7c1b61dd 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -559,8 +559,10 @@ void __khugepaged_enter(struct mm_struct *mm)
>>  		return;
>>  
>>  	slot = mm_slot_alloc(mm_slot_cache);
>> -	if (!slot)
>> +	if (!slot) {
>> +		mm_flags_clear(MMF_VM_HUGEPAGE, mm);
>>  		return;
>> +	}
>
>Note that, a racing khugepaged_enter_vma() may back off
>when it sees that MMF_VM_HUGEPAGE is set, but then the above
>clears the flag after slot alloc failure. So we end up not
>registering the mm with khugepaged. But I am sure no one
>cares, we are in much big trouble if slot alloc is failing.

Right. A racing khugepaged_enter_vma() can see MMF_VM_HUGEPAGE is set
and return, then !slot clears it again. If there is no later
khugepaged_enter_vma(), the mm still wouldn't get registered :)

>Although one could argue the same about this patch, I will
>still say it is important to fix "flag is set but not registered
>with khugepaged" because that just feels wrong.

+1 that is still better then keeping MMF_VM_HUGEPAGE set forever: any
later attempt can retry and install the mm_slot :)

Cheers, Lance


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm/khugepaged: clear MMF_VM_HUGEPAGE on mm_slot_alloc() failure
  2026-05-06 10:51   ` Lance Yang
@ 2026-05-08 21:41     ` David Hildenbrand (Arm)
  2026-05-09  1:57       ` Lance Yang
  2026-05-11  4:00       ` Dev Jain
  0 siblings, 2 replies; 10+ messages in thread
From: David Hildenbrand (Arm) @ 2026-05-08 21:41 UTC (permalink / raw)
  To: Lance Yang, dev.jain
  Cc: ye.liu, akpm, ljs, xhao, liuye, ziy, baolin.wang, liam, npache,
	ryan.roberts, baohua, akpm, linux-mm, linux-kernel

On 5/6/26 12:51, Lance Yang wrote:
> 
> On Wed, May 06, 2026 at 12:16:35PM +0530, Dev Jain wrote:
>>
>>
>> On 06/05/26 6:51 am, Ye Liu wrote:
>>> From: Ye Liu <liuye@kylinos.cn>
>>>
>>> __khugepaged_enter() sets MMF_VM_HUGEPAGE before allocating the
>>> corresponding mm_slot.  If mm_slot_alloc() fails, the function
>>> returns with the flag set but without inserting the mm into the
>>> khugepaged tracking structures.
>>>
>>> This leaves the mm in an inconsistent state: it is marked as
>>> registered (MMF_VM_HUGEPAGE set), but will never be scanned by
>>> khugepaged.  Future attempts to register the mm are skipped since
>>> khugepaged_enter_vma() checks the flag and returns early.
>>>
>>> Fix this by clearing MMF_VM_HUGEPAGE when mm_slot_alloc() fails,
>>> restoring the ability to retry registration later.
>>>
>>> Fixes: 16618670276a ("mm: khugepaged: avoid pointless allocation for struct mm_slot")
>>> Signed-off-by: Ye Liu <liuye@kylinos.cn>
>>> ---
>>> Changes since v1:
>>> - Add Fixes tag as suggested by Dev Jain and Lance Yang
>>>
>>>  mm/khugepaged.c | 4 +++-
>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>> index 7d48d4fbd5f3..60ab7c1b61dd 100644
>>> --- a/mm/khugepaged.c
>>> +++ b/mm/khugepaged.c
>>> @@ -559,8 +559,10 @@ void __khugepaged_enter(struct mm_struct *mm)
>>>  		return;
>>>  
>>>  	slot = mm_slot_alloc(mm_slot_cache);
>>> -	if (!slot)
>>> +	if (!slot) {
>>> +		mm_flags_clear(MMF_VM_HUGEPAGE, mm);
>>>  		return;
>>> +	}
>>
>> Note that, a racing khugepaged_enter_vma() may back off
>> when it sees that MMF_VM_HUGEPAGE is set, but then the above
>> clears the flag after slot alloc failure. So we end up not
>> registering the mm with khugepaged. But I am sure no one
>> cares, we are in much big trouble if slot alloc is failing.
> 
> Right. A racing khugepaged_enter_vma() can see MMF_VM_HUGEPAGE is set
> and return, then !slot clears it again. If there is no later
> khugepaged_enter_vma(), the mm still wouldn't get registered :)

So why not

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 5f4e009593e0..78735f34250a 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -437,13 +437,16 @@ void __khugepaged_enter(struct mm_struct *mm)

        /* __khugepaged_exit() must not run from under us */
        VM_BUG_ON_MM(collapse_test_exit(mm), mm);
-       if (unlikely(mm_flags_test_and_set(MMF_VM_HUGEPAGE, mm)))
-               return;

        slot = mm_slot_alloc(mm_slot_cache);
        if (!slot)
                return;

+       if (unlikely(mm_flags_test_and_set(MMF_VM_HUGEPAGE, mm))) {
+               mm_slot_free(mm_slot_cache, slot);
+               return;
+       }
+
        spin_lock(&khugepaged_mm_lock);
        mm_slot_insert(mm_slots_hash, mm, slot);
        /*


Arguably, on the race described above, likely the thread seeing the
MMF_VM_HUGEPAGE would likely similarly have failed the allocation.

I'm fine with either, just wanted to raise the (cleaner looking?) alternative
where we just properly back off?

-- 
Cheers,

David


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm/khugepaged: clear MMF_VM_HUGEPAGE on mm_slot_alloc() failure
  2026-05-08 21:41     ` David Hildenbrand (Arm)
@ 2026-05-09  1:57       ` Lance Yang
  2026-05-11  4:00       ` Dev Jain
  1 sibling, 0 replies; 10+ messages in thread
From: Lance Yang @ 2026-05-09  1:57 UTC (permalink / raw)
  To: david
  Cc: lance.yang, dev.jain, ye.liu, akpm, ljs, xhao, liuye, ziy,
	baolin.wang, liam, npache, ryan.roberts, baohua, akpm, linux-mm,
	linux-kernel


On Fri, May 08, 2026 at 11:41:34PM +0200, David Hildenbrand (Arm) wrote:
>On 5/6/26 12:51, Lance Yang wrote:
>> 
>> On Wed, May 06, 2026 at 12:16:35PM +0530, Dev Jain wrote:
>>>
>>>
>>> On 06/05/26 6:51 am, Ye Liu wrote:
>>>> From: Ye Liu <liuye@kylinos.cn>
>>>>
>>>> __khugepaged_enter() sets MMF_VM_HUGEPAGE before allocating the
>>>> corresponding mm_slot.  If mm_slot_alloc() fails, the function
>>>> returns with the flag set but without inserting the mm into the
>>>> khugepaged tracking structures.
>>>>
>>>> This leaves the mm in an inconsistent state: it is marked as
>>>> registered (MMF_VM_HUGEPAGE set), but will never be scanned by
>>>> khugepaged.  Future attempts to register the mm are skipped since
>>>> khugepaged_enter_vma() checks the flag and returns early.
>>>>
>>>> Fix this by clearing MMF_VM_HUGEPAGE when mm_slot_alloc() fails,
>>>> restoring the ability to retry registration later.
>>>>
>>>> Fixes: 16618670276a ("mm: khugepaged: avoid pointless allocation for struct mm_slot")
>>>> Signed-off-by: Ye Liu <liuye@kylinos.cn>
>>>> ---
>>>> Changes since v1:
>>>> - Add Fixes tag as suggested by Dev Jain and Lance Yang
>>>>
>>>>  mm/khugepaged.c | 4 +++-
>>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>>> index 7d48d4fbd5f3..60ab7c1b61dd 100644
>>>> --- a/mm/khugepaged.c
>>>> +++ b/mm/khugepaged.c
>>>> @@ -559,8 +559,10 @@ void __khugepaged_enter(struct mm_struct *mm)
>>>>  		return;
>>>>  
>>>>  	slot = mm_slot_alloc(mm_slot_cache);
>>>> -	if (!slot)
>>>> +	if (!slot) {
>>>> +		mm_flags_clear(MMF_VM_HUGEPAGE, mm);
>>>>  		return;
>>>> +	}
>>>
>>> Note that, a racing khugepaged_enter_vma() may back off
>>> when it sees that MMF_VM_HUGEPAGE is set, but then the above
>>> clears the flag after slot alloc failure. So we end up not
>>> registering the mm with khugepaged. But I am sure no one
>>> cares, we are in much big trouble if slot alloc is failing.
>> 
>> Right. A racing khugepaged_enter_vma() can see MMF_VM_HUGEPAGE is set
>> and return, then !slot clears it again. If there is no later
>> khugepaged_enter_vma(), the mm still wouldn't get registered :)
>
>So why not
>
>diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>index 5f4e009593e0..78735f34250a 100644
>--- a/mm/khugepaged.c
>+++ b/mm/khugepaged.c
>@@ -437,13 +437,16 @@ void __khugepaged_enter(struct mm_struct *mm)
>
>        /* __khugepaged_exit() must not run from under us */
>        VM_BUG_ON_MM(collapse_test_exit(mm), mm);
>-       if (unlikely(mm_flags_test_and_set(MMF_VM_HUGEPAGE, mm)))
>-               return;
>
>        slot = mm_slot_alloc(mm_slot_cache);
>        if (!slot)
>                return;
>
>+       if (unlikely(mm_flags_test_and_set(MMF_VM_HUGEPAGE, mm))) {
>+               mm_slot_free(mm_slot_cache, slot);
>+               return;
>+       }
>+
>        spin_lock(&khugepaged_mm_lock);
>        mm_slot_insert(mm_slots_hash, mm, slot);
>        /*
>
>
>Arguably, on the race described above, likely the thread seeing the
>MMF_VM_HUGEPAGE would likely similarly have failed the allocation.

Right, LGTM!

>I'm fine with either, just wanted to raise the (cleaner looking?) alternative
>where we just properly back off?

Dev suggested the same thing[1] on v1 as well. We should have gone that
way :)

Allocating the slot first and only setting MMF_VM_HUGEPAGE after that
makes the race go away. If mm_slot_alloc() fails, there is nothing to
undo.

[1] https://lore.kernel.org/linux-mm/aed7c1d5-2189-4ee2-b0f3-ce5a3e3c2118@arm.com/

Cheers, Lance


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm/khugepaged: clear MMF_VM_HUGEPAGE on mm_slot_alloc() failure
  2026-05-08 21:41     ` David Hildenbrand (Arm)
  2026-05-09  1:57       ` Lance Yang
@ 2026-05-11  4:00       ` Dev Jain
  2026-05-11  5:40         ` David Hildenbrand (Arm)
  1 sibling, 1 reply; 10+ messages in thread
From: Dev Jain @ 2026-05-11  4:00 UTC (permalink / raw)
  To: David Hildenbrand (Arm), Lance Yang
  Cc: ye.liu, akpm, ljs, xhao, liuye, ziy, baolin.wang, liam, npache,
	ryan.roberts, baohua, akpm, linux-mm, linux-kernel



On 09/05/26 3:11 am, David Hildenbrand (Arm) wrote:
> On 5/6/26 12:51, Lance Yang wrote:
>>
>> On Wed, May 06, 2026 at 12:16:35PM +0530, Dev Jain wrote:
>>>
>>>
>>> On 06/05/26 6:51 am, Ye Liu wrote:
>>>> From: Ye Liu <liuye@kylinos.cn>
>>>>
>>>> __khugepaged_enter() sets MMF_VM_HUGEPAGE before allocating the
>>>> corresponding mm_slot.  If mm_slot_alloc() fails, the function
>>>> returns with the flag set but without inserting the mm into the
>>>> khugepaged tracking structures.
>>>>
>>>> This leaves the mm in an inconsistent state: it is marked as
>>>> registered (MMF_VM_HUGEPAGE set), but will never be scanned by
>>>> khugepaged.  Future attempts to register the mm are skipped since
>>>> khugepaged_enter_vma() checks the flag and returns early.
>>>>
>>>> Fix this by clearing MMF_VM_HUGEPAGE when mm_slot_alloc() fails,
>>>> restoring the ability to retry registration later.
>>>>
>>>> Fixes: 16618670276a ("mm: khugepaged: avoid pointless allocation for struct mm_slot")
>>>> Signed-off-by: Ye Liu <liuye@kylinos.cn>
>>>> ---
>>>> Changes since v1:
>>>> - Add Fixes tag as suggested by Dev Jain and Lance Yang
>>>>
>>>>  mm/khugepaged.c | 4 +++-
>>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>>> index 7d48d4fbd5f3..60ab7c1b61dd 100644
>>>> --- a/mm/khugepaged.c
>>>> +++ b/mm/khugepaged.c
>>>> @@ -559,8 +559,10 @@ void __khugepaged_enter(struct mm_struct *mm)
>>>>  		return;
>>>>  
>>>>  	slot = mm_slot_alloc(mm_slot_cache);
>>>> -	if (!slot)
>>>> +	if (!slot) {
>>>> +		mm_flags_clear(MMF_VM_HUGEPAGE, mm);
>>>>  		return;
>>>> +	}
>>>
>>> Note that, a racing khugepaged_enter_vma() may back off
>>> when it sees that MMF_VM_HUGEPAGE is set, but then the above
>>> clears the flag after slot alloc failure. So we end up not
>>> registering the mm with khugepaged. But I am sure no one
>>> cares, we are in much big trouble if slot alloc is failing.
>>
>> Right. A racing khugepaged_enter_vma() can see MMF_VM_HUGEPAGE is set
>> and return, then !slot clears it again. If there is no later
>> khugepaged_enter_vma(), the mm still wouldn't get registered :)
> 
> So why not
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 5f4e009593e0..78735f34250a 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -437,13 +437,16 @@ void __khugepaged_enter(struct mm_struct *mm)
> 
>         /* __khugepaged_exit() must not run from under us */
>         VM_BUG_ON_MM(collapse_test_exit(mm), mm);
> -       if (unlikely(mm_flags_test_and_set(MMF_VM_HUGEPAGE, mm)))
> -               return;
> 
>         slot = mm_slot_alloc(mm_slot_cache);
>         if (!slot)
>                 return;
> 
> +       if (unlikely(mm_flags_test_and_set(MMF_VM_HUGEPAGE, mm))) {
> +               mm_slot_free(mm_slot_cache, slot);
> +               return;
> +       }
> +
>         spin_lock(&khugepaged_mm_lock);
>         mm_slot_insert(mm_slots_hash, mm, slot);
>         /*
> 
> 
> Arguably, on the race described above, likely the thread seeing the
> MMF_VM_HUGEPAGE would likely similarly have failed the allocation.
> 
> I'm fine with either, just wanted to raise the (cleaner looking?) alternative
> where we just properly back off?

Yes this is also fine - I am overthinking but I wasn't going this way because ...
A process doing THP allocations will fail on the mm_flags_test_and_set everytime
after the first time. So we will have an unconditional overhead of
mm_slot_alloc in the fault handler.

Again I am overthinking : )

> 



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm/khugepaged: clear MMF_VM_HUGEPAGE on mm_slot_alloc() failure
  2026-05-11  4:00       ` Dev Jain
@ 2026-05-11  5:40         ` David Hildenbrand (Arm)
  2026-05-11  5:44           ` Dev Jain
  0 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand (Arm) @ 2026-05-11  5:40 UTC (permalink / raw)
  To: Dev Jain, Lance Yang
  Cc: ye.liu, akpm, ljs, xhao, liuye, ziy, baolin.wang, liam, npache,
	ryan.roberts, baohua, akpm, linux-mm, linux-kernel

On 5/11/26 06:00, Dev Jain wrote:
> 
> 
> On 09/05/26 3:11 am, David Hildenbrand (Arm) wrote:
>> On 5/6/26 12:51, Lance Yang wrote:
>>>
>>>
>>> Right. A racing khugepaged_enter_vma() can see MMF_VM_HUGEPAGE is set
>>> and return, then !slot clears it again. If there is no later
>>> khugepaged_enter_vma(), the mm still wouldn't get registered :)
>>
>> So why not
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index 5f4e009593e0..78735f34250a 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -437,13 +437,16 @@ void __khugepaged_enter(struct mm_struct *mm)
>>
>>         /* __khugepaged_exit() must not run from under us */
>>         VM_BUG_ON_MM(collapse_test_exit(mm), mm);
>> -       if (unlikely(mm_flags_test_and_set(MMF_VM_HUGEPAGE, mm)))
>> -               return;
>>
>>         slot = mm_slot_alloc(mm_slot_cache);
>>         if (!slot)
>>                 return;
>>
>> +       if (unlikely(mm_flags_test_and_set(MMF_VM_HUGEPAGE, mm))) {
>> +               mm_slot_free(mm_slot_cache, slot);
>> +               return;
>> +       }
>> +
>>         spin_lock(&khugepaged_mm_lock);
>>         mm_slot_insert(mm_slots_hash, mm, slot);
>>         /*
>>
>>
>> Arguably, on the race described above, likely the thread seeing the
>> MMF_VM_HUGEPAGE would likely similarly have failed the allocation.
>>
>> I'm fine with either, just wanted to raise the (cleaner looking?) alternative
>> where we just properly back off?
> 
> Yes this is also fine - I am overthinking but I wasn't going this way because ...
> A process doing THP allocations will fail on the mm_flags_test_and_set everytime
> after the first time.
We should perform a mm_flags_test(MMF_VM_HUGEPAGE, vma->vm_mm) test before
calling the function when the flag might not be set yet: in khugepaged_enter_vma()

khugepaged_fork() should only get called once per process.

Which makes sense, because mm_flags_test_and_set() is expensive.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm/khugepaged: clear MMF_VM_HUGEPAGE on mm_slot_alloc() failure
  2026-05-11  5:40         ` David Hildenbrand (Arm)
@ 2026-05-11  5:44           ` Dev Jain
  0 siblings, 0 replies; 10+ messages in thread
From: Dev Jain @ 2026-05-11  5:44 UTC (permalink / raw)
  To: David Hildenbrand (Arm), Lance Yang
  Cc: ye.liu, akpm, ljs, xhao, liuye, ziy, baolin.wang, liam, npache,
	ryan.roberts, baohua, akpm, linux-mm, linux-kernel



On 11/05/26 11:10 am, David Hildenbrand (Arm) wrote:
> On 5/11/26 06:00, Dev Jain wrote:
>>
>>
>> On 09/05/26 3:11 am, David Hildenbrand (Arm) wrote:
>>> On 5/6/26 12:51, Lance Yang wrote:
>>>>
>>>>
>>>> Right. A racing khugepaged_enter_vma() can see MMF_VM_HUGEPAGE is set
>>>> and return, then !slot clears it again. If there is no later
>>>> khugepaged_enter_vma(), the mm still wouldn't get registered :)
>>>
>>> So why not
>>>
>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>> index 5f4e009593e0..78735f34250a 100644
>>> --- a/mm/khugepaged.c
>>> +++ b/mm/khugepaged.c
>>> @@ -437,13 +437,16 @@ void __khugepaged_enter(struct mm_struct *mm)
>>>
>>>         /* __khugepaged_exit() must not run from under us */
>>>         VM_BUG_ON_MM(collapse_test_exit(mm), mm);
>>> -       if (unlikely(mm_flags_test_and_set(MMF_VM_HUGEPAGE, mm)))
>>> -               return;
>>>
>>>         slot = mm_slot_alloc(mm_slot_cache);
>>>         if (!slot)
>>>                 return;
>>>
>>> +       if (unlikely(mm_flags_test_and_set(MMF_VM_HUGEPAGE, mm))) {
>>> +               mm_slot_free(mm_slot_cache, slot);
>>> +               return;
>>> +       }
>>> +
>>>         spin_lock(&khugepaged_mm_lock);
>>>         mm_slot_insert(mm_slots_hash, mm, slot);
>>>         /*
>>>
>>>
>>> Arguably, on the race described above, likely the thread seeing the
>>> MMF_VM_HUGEPAGE would likely similarly have failed the allocation.
>>>
>>> I'm fine with either, just wanted to raise the (cleaner looking?) alternative
>>> where we just properly back off?
>>
>> Yes this is also fine - I am overthinking but I wasn't going this way because ...
>> A process doing THP allocations will fail on the mm_flags_test_and_set everytime
>> after the first time.
> We should perform a mm_flags_test(MMF_VM_HUGEPAGE, vma->vm_mm) test before
> calling the function when the flag might not be set yet: in khugepaged_enter_vma()

Ah that slipped my mind, you are right.

> 
> khugepaged_fork() should only get called once per process.
> 
> Which makes sense, because mm_flags_test_and_set() is expensive.
> 



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-05-11  5:45 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-06  1:21 [PATCH v2] mm/khugepaged: clear MMF_VM_HUGEPAGE on mm_slot_alloc() failure Ye Liu
2026-05-06  2:05 ` Lance Yang
2026-05-06  5:17 ` Baolin Wang
2026-05-06  6:46 ` Dev Jain
2026-05-06 10:51   ` Lance Yang
2026-05-08 21:41     ` David Hildenbrand (Arm)
2026-05-09  1:57       ` Lance Yang
2026-05-11  4:00       ` Dev Jain
2026-05-11  5:40         ` David Hildenbrand (Arm)
2026-05-11  5:44           ` Dev Jain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox