From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E46B530F55F for ; Mon, 15 Jun 2026 13:35:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781530543; cv=none; b=UxVIULtXqM3IdKihg7XI9Ay/9BiE8hPMNadC2rQVJ4UMkTzTQuVIgFGmmeFyv6vB+NsIGkZmeBpyT8SehTjhmik/G8m4XbdfGhVWZbrXiz5g7YYmbyW52pgj4B6bQBPpJYVKJXQosGoDBOfRvywiXRie0g9AMKrrj3B/zXoBBj4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781530543; c=relaxed/simple; bh=GSmlKIozUbCHtNrgfvRbPcoIe1T1PmE+j6Tcm/FKrh0=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=hl9Bvnlna6Gb7OM8Ug2KGPVL+isObufU8j6LViMCLwt5Cr0gZ3x5INHlzlF97QvAa4Wi+ZhbYNI5FVO/ACvt1xuc4zdIshEAmpKtDEyS4Ny8irIsDel0F37b7aECtFhMOY2dd0terr2Ol7WJfsOV8F6OMbu9psj/WzWq9BRt52E= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=BlW1Jx+R; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="BlW1Jx+R" Received: by smtp.kernel.org (Postfix) with ESMTPSA id AC8A51F000E9; Mon, 15 Jun 2026 13:35:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1781530542; bh=uUFCxSXn77xBdFdSa9gDApYhBo9jSYuSz5ASxlKKY7Q=; h=From:To:Cc:Subject:In-Reply-To:References:Date; b=BlW1Jx+RJJPWKOuK2BiPu12UqEMM/0i9SqlIuUO8sktQdXHgZfF5/AVYlbU9km0AC CVXiobcnUzhC7Mqe/gAkNn8f7BFLyztRpIbj7yvyMgp/zy4nxulzodpOsJu8QqWxqU WAInfd+vR+vgM5GyHybLyVd9XvunckQ+3NlxcfzVhYnLuDheQ8+SY4ApjlSAP+1h7M ZuucmK2QuzfQ9TeQnl4iR+657mqzMtBmuflMzNXGfEImVPdA4V7LBFQMdb604EaEvq KdbFlKAnkYuJ5ZtzK74U7MERy+c84vIOOJt/12WkHsKJq5NCndA/hsKA5xJJp+iH0j VSOz1LY8W/F5w== From: Pratyush Yadav To: Mike Rapoport Cc: Pratyush Yadav , Pasha Tatashin , Alexander Graf , Muchun Song , Oscar Salvador , David Hildenbrand , Andrew Morton , Jason Miu , Jork Loeser , kexec@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2 16/18] memblock: make HugeTLB bootmem allocation work with KHO In-Reply-To: <178143855120.2123877.5431342391381982046.b4-review@b4> (Mike Rapoport's message of "Sun, 14 Jun 2026 15:02:31 +0300") References: <20260605183501.3884950-1-pratyush@kernel.org> <20260605183501.3884950-17-pratyush@kernel.org> <178143855120.2123877.5431342391381982046.b4-review@b4> Date: Mon, 15 Jun 2026 15:35:39 +0200 Message-ID: <2vxzpl1soris.fsf@kernel.org> User-Agent: Gnus/5.13 (Gnus v5.13) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain On Sun, Jun 14 2026, Mike Rapoport wrote: > On Fri, 05 Jun 2026 20:34:49 +0200, Pratyush Yadav wrote: >> Gigantic huge page allocation is somewhat broken currently when KHO is >> used. >> >> Firstly, they break KHO scratch size accounting. RSRV_KERN is used to >> track how much memory is reserved for use by the kernel. Since >> alloc_bootmem() calls the memblock_alloc*() APIs, the hugepages > > hugetlb::alloc_bootmem() ACK. > >> [...] >> First, it does not use mirrored memory for hugetlb. Mirrored memory is a >> limited resource that is best saved for kernel data structures, not user >> memory. >> >> Second, if the memory found overlaps with KHO scratch areas, it discards >> the memory and retries. > > This sentence is somewhat hard to parse. Okay, let me retry: Second, if the free memory area found by memblock_find_in_range_node() is a part of a KHO scratch area, the free area is not used. Allocation is retried starting after the free area to ensure no hugepages come from KHO scratch. Any better? > >> >> >> diff --git a/mm/memblock.c b/mm/memblock.c >> index 6349c48154f4..131e54dd5d8d 100644 >> --- a/mm/memblock.c >> +++ b/mm/memblock.c >> @@ -1756,6 +1761,69 @@ void * __init memblock_alloc_try_nid_raw( >> [ ... skip 51 lines ... ] >> + if (memblock_bottom_up()) >> + start = addr + size; >> + else >> + start = addr - size; >> + >> + goto retry; > > Hmm, two goto retry don't seem nice :/ > Although I can't see how to imporove it really. Dunno, looked easy enough to understand to me. > > Maybe add a helper for going the node fallback? There is a small downside. There will then be no way to know the fallback was tried already, so if a retry is done because of scratch overlap, the fallback needs to be done again. I don't think it should be too bad, so if you still prefer this then I can do it. -- Regards, Pratyush Yadav