All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lance Yang <lance.yang@linux.dev>
To: david@kernel.org
Cc: lance.yang@linux.dev, dev.jain@arm.com, ye.liu@linux.dev,
	akpm@linux-foundation.org, ljs@kernel.org,
	xhao@linux.alibaba.com, liuye@kylinos.cn, ziy@nvidia.com,
	baolin.wang@linux.alibaba.com, liam@infradead.org,
	npache@redhat.com, ryan.roberts@arm.com, baohua@kernel.org,
	akpm@linux-foudation.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] mm/khugepaged: clear MMF_VM_HUGEPAGE on mm_slot_alloc() failure
Date: Sat,  9 May 2026 09:57:22 +0800	[thread overview]
Message-ID: <20260509015723.9467-1-lance.yang@linux.dev> (raw)
In-Reply-To: <6b6b094b-8dcb-423b-bb86-ef1439887eed@kernel.org>


On Fri, May 08, 2026 at 11:41:34PM +0200, David Hildenbrand (Arm) wrote:
>On 5/6/26 12:51, Lance Yang wrote:
>> 
>> On Wed, May 06, 2026 at 12:16:35PM +0530, Dev Jain wrote:
>>>
>>>
>>> On 06/05/26 6:51 am, Ye Liu wrote:
>>>> From: Ye Liu <liuye@kylinos.cn>
>>>>
>>>> __khugepaged_enter() sets MMF_VM_HUGEPAGE before allocating the
>>>> corresponding mm_slot.  If mm_slot_alloc() fails, the function
>>>> returns with the flag set but without inserting the mm into the
>>>> khugepaged tracking structures.
>>>>
>>>> This leaves the mm in an inconsistent state: it is marked as
>>>> registered (MMF_VM_HUGEPAGE set), but will never be scanned by
>>>> khugepaged.  Future attempts to register the mm are skipped since
>>>> khugepaged_enter_vma() checks the flag and returns early.
>>>>
>>>> Fix this by clearing MMF_VM_HUGEPAGE when mm_slot_alloc() fails,
>>>> restoring the ability to retry registration later.
>>>>
>>>> Fixes: 16618670276a ("mm: khugepaged: avoid pointless allocation for struct mm_slot")
>>>> Signed-off-by: Ye Liu <liuye@kylinos.cn>
>>>> ---
>>>> Changes since v1:
>>>> - Add Fixes tag as suggested by Dev Jain and Lance Yang
>>>>
>>>>  mm/khugepaged.c | 4 +++-
>>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>>> index 7d48d4fbd5f3..60ab7c1b61dd 100644
>>>> --- a/mm/khugepaged.c
>>>> +++ b/mm/khugepaged.c
>>>> @@ -559,8 +559,10 @@ void __khugepaged_enter(struct mm_struct *mm)
>>>>  		return;
>>>>  
>>>>  	slot = mm_slot_alloc(mm_slot_cache);
>>>> -	if (!slot)
>>>> +	if (!slot) {
>>>> +		mm_flags_clear(MMF_VM_HUGEPAGE, mm);
>>>>  		return;
>>>> +	}
>>>
>>> Note that, a racing khugepaged_enter_vma() may back off
>>> when it sees that MMF_VM_HUGEPAGE is set, but then the above
>>> clears the flag after slot alloc failure. So we end up not
>>> registering the mm with khugepaged. But I am sure no one
>>> cares, we are in much big trouble if slot alloc is failing.
>> 
>> Right. A racing khugepaged_enter_vma() can see MMF_VM_HUGEPAGE is set
>> and return, then !slot clears it again. If there is no later
>> khugepaged_enter_vma(), the mm still wouldn't get registered :)
>
>So why not
>
>diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>index 5f4e009593e0..78735f34250a 100644
>--- a/mm/khugepaged.c
>+++ b/mm/khugepaged.c
>@@ -437,13 +437,16 @@ void __khugepaged_enter(struct mm_struct *mm)
>
>        /* __khugepaged_exit() must not run from under us */
>        VM_BUG_ON_MM(collapse_test_exit(mm), mm);
>-       if (unlikely(mm_flags_test_and_set(MMF_VM_HUGEPAGE, mm)))
>-               return;
>
>        slot = mm_slot_alloc(mm_slot_cache);
>        if (!slot)
>                return;
>
>+       if (unlikely(mm_flags_test_and_set(MMF_VM_HUGEPAGE, mm))) {
>+               mm_slot_free(mm_slot_cache, slot);
>+               return;
>+       }
>+
>        spin_lock(&khugepaged_mm_lock);
>        mm_slot_insert(mm_slots_hash, mm, slot);
>        /*
>
>
>Arguably, on the race described above, likely the thread seeing the
>MMF_VM_HUGEPAGE would likely similarly have failed the allocation.

Right, LGTM!

>I'm fine with either, just wanted to raise the (cleaner looking?) alternative
>where we just properly back off?

Dev suggested the same thing[1] on v1 as well. We should have gone that
way :)

Allocating the slot first and only setting MMF_VM_HUGEPAGE after that
makes the race go away. If mm_slot_alloc() fails, there is nothing to
undo.

[1] https://lore.kernel.org/linux-mm/aed7c1d5-2189-4ee2-b0f3-ce5a3e3c2118@arm.com/

Cheers, Lance


  reply	other threads:[~2026-05-09  1:57 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-06  1:21 [PATCH v2] mm/khugepaged: clear MMF_VM_HUGEPAGE on mm_slot_alloc() failure Ye Liu
2026-05-06  2:05 ` Lance Yang
2026-05-06  5:17 ` Baolin Wang
2026-05-06  6:46 ` Dev Jain
2026-05-06 10:51   ` Lance Yang
2026-05-08 21:41     ` David Hildenbrand (Arm)
2026-05-09  1:57       ` Lance Yang [this message]
2026-05-11  4:00       ` Dev Jain
2026-05-11  5:40         ` David Hildenbrand (Arm)
2026-05-11  5:44           ` Dev Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260509015723.9467-1-lance.yang@linux.dev \
    --to=lance.yang@linux.dev \
    --cc=akpm@linux-foudation.org \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=liam@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liuye@kylinos.cn \
    --cc=ljs@kernel.org \
    --cc=npache@redhat.com \
    --cc=ryan.roberts@arm.com \
    --cc=xhao@linux.alibaba.com \
    --cc=ye.liu@linux.dev \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.