From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 48D303E173F for ; Tue, 12 May 2026 13:25:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778592340; cv=none; b=bBiDgj+JTqHdOL2cSdPmkmw5ltMVcqigwEZPKIwsYPQLvTGp8leCaetpU4phWlHVaVBk8S6s2I1DJXgpIZ+3ADys+A4hYXFAx+RZzbm/EKWNViOwTKI0cqCkXsf42plwLU3dgRLGhZObeJoCEd9wuEHgcMNyxUJ85kgWUY5y4hw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778592340; c=relaxed/simple; bh=6m9LEQasYv3lihkW2Q7TQ7RGwMWSZiJ4Q1mLlpGYggw=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=O4nhoKip9olItcX0dQ5633UGaZKhq6SVgRE+fE+kCceeA1tt5wfaAD0IS2VpSVmroZISziO04nPYrs6S1CR5Vw9JAEkp7E6oRfdmJ9P/OmZlYrwVR49IyH1qbb/bVb2znzqr/RoYX51p0wCMpNw1uvaI2iyuWE7sJ3WEgR533sM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de; spf=pass smtp.mailfrom=suse.de; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b=dDQimm1X; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b=tUM7BnW0; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b=dDQimm1X; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b=tUM7BnW0; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b="dDQimm1X"; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b="tUM7BnW0"; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b="dDQimm1X"; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b="tUM7BnW0" Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 8B7075D455; Tue, 12 May 2026 13:25:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1778592337; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=vLlP/U0XqQzf4ZxxvPLpA37cgquIeHojOmsRWrWUAj8=; b=dDQimm1X1xceRuI47hi+xdNGv65ih/HxqqXtFBL+9PYacdSU4lc8+Fzx3XsykC9+ju2t78 Aiccd8lTU4RQ/az5JhsIaESm/FDBNC07I/Ox5xDValSWLV+VGSn6Hmc6Sbob7xbwqpBbvV YxG6v7gXczAABY+tvTL5e44+nmmnFz8= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1778592337; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=vLlP/U0XqQzf4ZxxvPLpA37cgquIeHojOmsRWrWUAj8=; b=tUM7BnW0/kHQm8Kaor9TNZpVsgtk0tKmA5dawWoTnygrk6OSHKGkLrhbzWMVsIMy+9wWPL K2woTbknQ561u2Dg== Authentication-Results: smtp-out2.suse.de; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=dDQimm1X; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=tUM7BnW0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1778592337; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=vLlP/U0XqQzf4ZxxvPLpA37cgquIeHojOmsRWrWUAj8=; b=dDQimm1X1xceRuI47hi+xdNGv65ih/HxqqXtFBL+9PYacdSU4lc8+Fzx3XsykC9+ju2t78 Aiccd8lTU4RQ/az5JhsIaESm/FDBNC07I/Ox5xDValSWLV+VGSn6Hmc6Sbob7xbwqpBbvV YxG6v7gXczAABY+tvTL5e44+nmmnFz8= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1778592337; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=vLlP/U0XqQzf4ZxxvPLpA37cgquIeHojOmsRWrWUAj8=; b=tUM7BnW0/kHQm8Kaor9TNZpVsgtk0tKmA5dawWoTnygrk6OSHKGkLrhbzWMVsIMy+9wWPL K2woTbknQ561u2Dg== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 00F4B593AA; Tue, 12 May 2026 13:25:35 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id DfBROU8qA2oPUgAAD6G6ig (envelope-from ); Tue, 12 May 2026 13:25:35 +0000 Date: Tue, 12 May 2026 15:25:30 +0200 From: Oscar Salvador To: ackerleytng@google.com Cc: Muchun Song , David Hildenbrand , Andrew Morton , fvdl@google.com, jiaqiyan@google.com, joshua.hahnjy@gmail.com, jthoughton@google.com, mhocko@kernel.org, michael.roth@amd.com, pasha.tatashin@soleen.com, pbonzini@redhat.com, peterx@redhat.com, pratyush@kernel.org, rick.p.edgecombe@intel.com, rientjes@google.com, roman.gushchin@linux.dev, seanjc@google.com, shakeel.butt@linux.dev, shivankg@amd.com, vannapurve@google.com, yan.y.zhao@intel.com, Dan Williams , Jason Gunthorpe , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2 6/6] mm: hugetlb: Refactor out hugetlb_alloc_folio() Message-ID: References: <20260506-hugetlb-open-up-v2-0-826a0c5f28fc@google.com> <20260506-hugetlb-open-up-v2-6-826a0c5f28fc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260506-hugetlb-open-up-v2-6-826a0c5f28fc@google.com> X-Spam-Flag: NO X-Spam-Score: -3.01 X-Rspamd-Action: no action X-Spamd-Result: default: False [-3.01 / 50.00]; BAYES_HAM(-3.00)[100.00%]; SUSPICIOUS_RECIPS(1.50)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; MX_GOOD(-0.01)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; FUZZY_RATELIMITED(0.00)[rspamd.com]; SPAMHAUS_XBL(0.00)[2a07:de40:b281:104:10:150:64:97:from]; ARC_NA(0.00)[]; RCPT_COUNT_TWELVE(0.00)[26]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVRCPT(0.00)[gmail.com]; RCVD_TLS_ALL(0.00)[]; TO_DN_SOME(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; DNSWL_BLOCKED(0.00)[2a07:de40:b281:106:10:150:64:167:received,2a07:de40:b281:104:10:150:64:97:from]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; FREEMAIL_CC(0.00)[linux.dev,kernel.org,linux-foundation.org,google.com,gmail.com,amd.com,soleen.com,redhat.com,intel.com,ziepe.ca,kvack.org,vger.kernel.org]; TAGGED_RCPT(0.00)[]; MISSING_XM_UA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; DKIM_TRACE(0.00)[suse.de:+]; R_RATELIMIT(0.00)[to_ip_from(RLdwutopmmzmkw9dc3o593oo3p)]; DBL_BLOCKED_OPENRESOLVER(0.00)[imap1.dmz-prg2.suse.org:rdns,imap1.dmz-prg2.suse.org:helo,localhost.localdomain:mid,suse.de:dkim] X-Rspamd-Server: rspamd1.dmz-prg2.suse.org X-Rspamd-Queue-Id: 8B7075D455 X-Spam-Level: On Wed, May 06, 2026 at 08:54:42AM -0700, Ackerley Tng via B4 Relay wrote: > From: Ackerley Tng > > Refactor out hugetlb_alloc_folio() from alloc_hugetlb_folio(), which > handles allocation of a folio and memory and HugeTLB charging to cgroups. > > This refactoring decouples the HugeTLB page allocation from VMAs, > specifically: > > 1. Reservations (as in resv_map) are stored in the vma > 2. mpol is stored at vma->vm_policy > 3. A vma must be used for allocation even if the pages are not meant to be > used by host process. > > Without this coupling, VMAs are no longer a requirement for > allocation. This opens up the allocation routine for usage without VMAs, > which will allow guest_memfd to use HugeTLB as a more generic allocator of > huge pages, since guest_memfd memory may not have any associated VMAs by > design. In addition, direct allocations from HugeTLB could possibly be > refactored to avoid the use of a pseudo-VMA. > > Also, this decouples HugeTLB page allocation from HugeTLBfs, where the > subpool is stored at the fs mount. This is also a requirement for > guest_memfd, where the plan is to have a subpool created per-fd and stored > on the inode. > > No functional change intended. > > Signed-off-by: Ackerley Tng I yet have to review more thoroughly, but I have a comment below: > --- > include/linux/hugetlb.h | 3 + > mm/hugetlb.c | 179 ++++++++++++++++++++++++++---------------------- > 2 files changed, 100 insertions(+), 82 deletions(-) > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index 93418625d3c5f..ec205d8580885 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -705,6 +705,9 @@ bool hugetlb_bootmem_page_zones_valid(int nid, struct huge_bootmem_page *m); > int isolate_or_dissolve_huge_folio(struct folio *folio, struct list_head *list); > int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn); > void wait_for_freed_hugetlb_folios(void); > +struct folio *hugetlb_alloc_folio(struct hstate *h, struct hugepage_subpool *spool, > + struct mempolicy *mpol, int nid, nodemask_t *nodemask, > + bool charge_hugetlb_cgroup_rsvd, bool use_global_reservation); > struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, > unsigned long addr, bool cow_from_owner); > struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid, > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 4159b3565a9be..a1c5b94e52e0a 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -2821,6 +2821,88 @@ void wait_for_freed_hugetlb_folios(void) > flush_work(&free_hpage_work); > } > > +struct folio *hugetlb_alloc_folio(struct hstate *h, struct hugepage_subpool *spool, > + struct mempolicy *mpol, int nid, nodemask_t *nodemask, > + bool charge_hugetlb_cgroup_rsvd, bool use_global_reservation) I think I would put that information into a context struct that we can pass to hugetlb_alloc_folio, otherwise this seems too overloaded, and maybe we need to add more params in the future to tweak even more the allocation. E.g: struct hugetlb_alloc_ctxt { struct hstate *h; struct hugepage_subpool *spool; gfp_t gfp_mask; ... }; Maybe we can go even further and convert those boleans into action flags. I have the feeling that as is, it is quite ad-hoc code, and the thing is that if we want to open hugetlb allocations into the world, we should make it as generic as possible, foreseeing that we do not have to change the API whenever a new user pops up. -- Oscar Salvador SUSE Labs