From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-174.mta0.migadu.com (out-174.mta0.migadu.com [91.218.175.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3772927CB02 for ; Sat, 9 May 2026 01:57:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.174 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778291857; cv=none; b=Xpu4aMYuQ09wmjKqZ4YwSvaY9uS78ToKXjUnO6S0Bko8kuA2zgPPcNy8IjvO92rwn82h7qSicce604Cfx+aO/z3GIFk68cgIXZHVS2mcSuRqQ5m4gcYONFSKPmI8x/Q3zvldAlP5mwyOnTPTRApi5FsoWfCqALZd3KLE9/eQFDg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778291857; c=relaxed/simple; bh=CO4AuSKiqkC4Qth19dlKtHmsq29MOylxIpbiUqwCXFU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=Rc4GPObkSn1CXoxjOgld9xso2MBZHFcp+r5rneJ/jyMIoCh9lOROdbEc8d+2EddRtl5OjxDlF2VI1BXFAiAtpPAZYIGbFbuZXMggnpS1r0XLB0WBGhtTa0b3PkhJ0GL6LJZZeJ+PWEBMBaRLj4q/088TVO5FlwoD+xTuodPbDYw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=SLnNAhR9; arc=none smtp.client-ip=91.218.175.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="SLnNAhR9" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1778291851; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+89c8WdNl4yui4bDI9B8iF1YgOI8CD6gGbpRYrrMpJQ=; b=SLnNAhR9LP3Rdi2k3Ciwo3goaQve5uG+N7Mz14jgDZGD8dkGhJczBvxYGSho3lxiMEokg3 kqvatDu0J1G9YQhggnZi6ZUNgv3vzX6BUQv9Om0O3dV7zRhRLAaDi6mDynP3W4riXkNwdV KJNaQ3x577cWLfno9MeyAQaMSkV3Gf0= From: Lance Yang To: david@kernel.org Cc: lance.yang@linux.dev, dev.jain@arm.com, ye.liu@linux.dev, akpm@linux-foundation.org, ljs@kernel.org, xhao@linux.alibaba.com, liuye@kylinos.cn, ziy@nvidia.com, baolin.wang@linux.alibaba.com, liam@infradead.org, npache@redhat.com, ryan.roberts@arm.com, baohua@kernel.org, akpm@linux-foudation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2] mm/khugepaged: clear MMF_VM_HUGEPAGE on mm_slot_alloc() failure Date: Sat, 9 May 2026 09:57:22 +0800 Message-Id: <20260509015723.9467-1-lance.yang@linux.dev> In-Reply-To: <6b6b094b-8dcb-423b-bb86-ef1439887eed@kernel.org> References: <6b6b094b-8dcb-423b-bb86-ef1439887eed@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT On Fri, May 08, 2026 at 11:41:34PM +0200, David Hildenbrand (Arm) wrote: >On 5/6/26 12:51, Lance Yang wrote: >> >> On Wed, May 06, 2026 at 12:16:35PM +0530, Dev Jain wrote: >>> >>> >>> On 06/05/26 6:51 am, Ye Liu wrote: >>>> From: Ye Liu >>>> >>>> __khugepaged_enter() sets MMF_VM_HUGEPAGE before allocating the >>>> corresponding mm_slot. If mm_slot_alloc() fails, the function >>>> returns with the flag set but without inserting the mm into the >>>> khugepaged tracking structures. >>>> >>>> This leaves the mm in an inconsistent state: it is marked as >>>> registered (MMF_VM_HUGEPAGE set), but will never be scanned by >>>> khugepaged. Future attempts to register the mm are skipped since >>>> khugepaged_enter_vma() checks the flag and returns early. >>>> >>>> Fix this by clearing MMF_VM_HUGEPAGE when mm_slot_alloc() fails, >>>> restoring the ability to retry registration later. >>>> >>>> Fixes: 16618670276a ("mm: khugepaged: avoid pointless allocation for struct mm_slot") >>>> Signed-off-by: Ye Liu >>>> --- >>>> Changes since v1: >>>> - Add Fixes tag as suggested by Dev Jain and Lance Yang >>>> >>>> mm/khugepaged.c | 4 +++- >>>> 1 file changed, 3 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c >>>> index 7d48d4fbd5f3..60ab7c1b61dd 100644 >>>> --- a/mm/khugepaged.c >>>> +++ b/mm/khugepaged.c >>>> @@ -559,8 +559,10 @@ void __khugepaged_enter(struct mm_struct *mm) >>>> return; >>>> >>>> slot = mm_slot_alloc(mm_slot_cache); >>>> - if (!slot) >>>> + if (!slot) { >>>> + mm_flags_clear(MMF_VM_HUGEPAGE, mm); >>>> return; >>>> + } >>> >>> Note that, a racing khugepaged_enter_vma() may back off >>> when it sees that MMF_VM_HUGEPAGE is set, but then the above >>> clears the flag after slot alloc failure. So we end up not >>> registering the mm with khugepaged. But I am sure no one >>> cares, we are in much big trouble if slot alloc is failing. >> >> Right. A racing khugepaged_enter_vma() can see MMF_VM_HUGEPAGE is set >> and return, then !slot clears it again. If there is no later >> khugepaged_enter_vma(), the mm still wouldn't get registered :) > >So why not > >diff --git a/mm/khugepaged.c b/mm/khugepaged.c >index 5f4e009593e0..78735f34250a 100644 >--- a/mm/khugepaged.c >+++ b/mm/khugepaged.c >@@ -437,13 +437,16 @@ void __khugepaged_enter(struct mm_struct *mm) > > /* __khugepaged_exit() must not run from under us */ > VM_BUG_ON_MM(collapse_test_exit(mm), mm); >- if (unlikely(mm_flags_test_and_set(MMF_VM_HUGEPAGE, mm))) >- return; > > slot = mm_slot_alloc(mm_slot_cache); > if (!slot) > return; > >+ if (unlikely(mm_flags_test_and_set(MMF_VM_HUGEPAGE, mm))) { >+ mm_slot_free(mm_slot_cache, slot); >+ return; >+ } >+ > spin_lock(&khugepaged_mm_lock); > mm_slot_insert(mm_slots_hash, mm, slot); > /* > > >Arguably, on the race described above, likely the thread seeing the >MMF_VM_HUGEPAGE would likely similarly have failed the allocation. Right, LGTM! >I'm fine with either, just wanted to raise the (cleaner looking?) alternative >where we just properly back off? Dev suggested the same thing[1] on v1 as well. We should have gone that way :) Allocating the slot first and only setting MMF_VM_HUGEPAGE after that makes the race go away. If mm_slot_alloc() fails, there is nothing to undo. [1] https://lore.kernel.org/linux-mm/aed7c1d5-2189-4ee2-b0f3-ce5a3e3c2118@arm.com/ Cheers, Lance