From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-176.mta1.migadu.com (out-176.mta1.migadu.com [95.215.58.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D38E319F135 for ; Tue, 20 Jan 2026 02:50:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.176 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768877452; cv=none; b=oTI8GGHL/+eOxsuvTC+xcUpYPCkviAGF0JuQhrOreY44XE1ayiS0x/TJch8liI4/rryIfEPdu4DJxMCo6iZUKqFt7BLTeDJGX5Pk69RBzb5V7xpsU0YrvA+uFsYQm28ieFQEu4Rp0fpnOOPdJr1LmLz61pd5z5EdznG7znyDGWQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768877452; c=relaxed/simple; bh=Mebmj9B6gzVsk0nMrHB+h3Y+buG8F2knF0O3EpGD5QU=; h=Content-Type:Mime-Version:Subject:From:In-Reply-To:Date:Cc: Message-Id:References:To; b=hpgKEt8/3MS6N58vxZSeu3me3lzBABpGNn0SeseAeVRFS5+vDC2NSEHnStRPGicNP1++uAa4c31VWPfJlNzrPNmLak2mW8KD3VmaYZrvj2zjk8vIMQCgpMUGoSpwy/1P8Zb3+EvKHNbPWxvCJX+O2q/MXdmeqvWyzGD5FnaBOag= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=tQOwTb9/; arc=none smtp.client-ip=95.215.58.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="tQOwTb9/" Content-Type: text/plain; charset=us-ascii DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1768877447; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aXiYX4KamXJdrCUq7b0aguR0/Zq+QSB8PNxFvwlaWr4=; b=tQOwTb9/2aXdO3zPJRLpLp5A18+l5o5AmBPtdS5XKQLRREj8EYCvEXqUhem24ohyoA55N3 HU/RZBDkmA1EhaUgt3z4J+gkypkiwczEcv/Z4g4p9+pfzqkwzG1bqkcYArpGotFIhZ2lNt Fa/In/t5d/GjD0GoM4ZGVCdzCgN93UY= Precedence: bulk X-Mailing-List: linux-doc@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3864.300.41.1.7\)) Subject: Re: [PATCHv3 10/15] mm/hugetlb: Remove fake head pages X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Muchun Song In-Reply-To: Date: Tue, 20 Jan 2026 10:50:03 +0800 Cc: "David Hildenbrand (Red Hat)" , Andrew Morton , Matthew Wilcox , Usama Arif , Frank van der Linden , Oscar Salvador , Mike Rapoport , Vlastimil Babka , Lorenzo Stoakes , Zi Yan , Baoquan He , Michal Hocko , Johannes Weiner , Jonathan Corbet , kernel-team@meta.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org Content-Transfer-Encoding: quoted-printable Message-Id: <9C029C3B-E140-4FC2-A680-8580AC753B69@linux.dev> References: <20260115144604.822702-1-kas@kernel.org> <20260115144604.822702-11-kas@kernel.org> <30ae1623-63f9-4729-9c19-9b0a9a0ae9f1@kernel.org> <53980726-C7F0-4648-99E9-89E10645F2E7@linux.dev> <0F1C93F3-9A1A-4929-9157-589CF8C0588D@linux.dev> To: Kiryl Shutsemau X-Migadu-Flow: FLOW_OUT > On Jan 19, 2026, at 23:15, Kiryl Shutsemau wrote: >=20 > On Sat, Jan 17, 2026 at 10:38:48AM +0800, Muchun Song wrote: >>=20 >>=20 >>> On Jan 16, 2026, at 23:52, Kiryl Shutsemau wrote: >>>=20 >>> On Fri, Jan 16, 2026 at 10:38:02AM +0800, Muchun Song wrote: >>>>=20 >>>>=20 >>>>> On Jan 16, 2026, at 01:23, Kiryl Shutsemau wrote: >>>>>=20 >>>>> On Thu, Jan 15, 2026 at 05:49:43PM +0100, David Hildenbrand (Red = Hat) wrote: >>>>>> On 1/15/26 15:45, Kiryl Shutsemau wrote: >>>>>>> HugeTLB Vmemmap Optimization (HVO) reduces memory usage by = freeing most >>>>>>> vmemmap pages for huge pages and remapping the freed range to a = single >>>>>>> page containing the struct page metadata. >>>>>>>=20 >>>>>>> With the new mask-based compound_info encoding (for power-of-2 = struct >>>>>>> page sizes), all tail pages of the same order are now identical >>>>>>> regardless of which compound page they belong to. This means the = tail >>>>>>> pages can be truly shared without fake heads. >>>>>>>=20 >>>>>>> Allocate a single page of initialized tail struct pages per NUMA = node >>>>>>> per order in the vmemmap_tails[] array in pglist_data. All huge = pages >>>>>>> of that order on the node share this tail page, mapped read-only = into >>>>>>> their vmemmap. The head page remains unique per huge page. >>>>>>>=20 >>>>>>> This eliminates fake heads while maintaining the same memory = savings, >>>>>>> and simplifies compound_head() by removing fake head detection. >>>>>>>=20 >>>>>>> Signed-off-by: Kiryl Shutsemau >>>>>>> --- >>>>>>> include/linux/mmzone.h | 16 ++++++++++++++- >>>>>>> mm/hugetlb_vmemmap.c | 44 = ++++++++++++++++++++++++++++++++++++++++-- >>>>>>> mm/sparse-vmemmap.c | 44 = ++++++++++++++++++++++++++++++++++-------- >>>>>>> 3 files changed, 93 insertions(+), 11 deletions(-) >>>>>>>=20 >>>>>>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h >>>>>>> index 322ed4c42cfc..2ee3eb610291 100644 >>>>>>> --- a/include/linux/mmzone.h >>>>>>> +++ b/include/linux/mmzone.h >>>>>>> @@ -82,7 +82,11 @@ >>>>>>> * currently expect (see CONFIG_HAVE_GIGANTIC_FOLIOS): with = hugetlb, we expect >>>>>>> * no folios larger than 16 GiB on 64bit and 1 GiB on 32bit. >>>>>>> */ >>>>>>> -#define MAX_FOLIO_ORDER get_order(IS_ENABLED(CONFIG_64BIT) ? = SZ_16G : SZ_1G) >>>>>>> +#ifdef CONFIG_64BIT >>>>>>> +#define MAX_FOLIO_ORDER (34 - PAGE_SHIFT) >>>>>>> +#else >>>>>>> +#define MAX_FOLIO_ORDER (30 - PAGE_SHIFT) >>>>>>> +#endif >>>>>>=20 >>>>>> Where do these magic values stem from, and how do they related to = the >>>>>> comment above that clearly spells out 16G vs. 1G ? >>>>>=20 >>>>> This doesn't change the resulting value: 1UL << 34 is 16GiB, 1UL = << 30 >>>>> is 1G. Subtract PAGE_SHIFT to get the order. >>>>>=20 >>>>> The change allows the value to be used to define NR_VMEMMAP_TAILS = which >>>>> is used specify size of vmemmap_tails array. >>>>=20 >>>> How about allocate ->vmemmap_tails array dynamically? If sizeof of = struct >>>> page is not power of two, then we could optimize away this array. = Besides, >>>> the original MAX_FOLIO_ORDER could work as well. >>>=20 >>> This is tricky. >>>=20 >>> We need vmemmap_tails array to be around early, in >>> hugetlb_vmemmap_init_early(). By the time, we don't have slab >>> functional yet. >>=20 >> I mean zero-size array at the end of pg_data_t, no slab is needed. >=20 > For !NUMA, the struct is in BSS. See contig_page_data. Right. I missed that. >=20 > Dynamic array won't fly there. >=20 > --=20 > Kiryl Shutsemau / Kirill A. Shutemov