The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: Oscar Salvador <osalvador@suse.de>
To: ackerleytng@google.com
Cc: Muchun Song <muchun.song@linux.dev>,
	David Hildenbrand <david@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	fvdl@google.com, jiaqiyan@google.com, joshua.hahnjy@gmail.com,
	jthoughton@google.com, mhocko@kernel.org, michael.roth@amd.com,
	pasha.tatashin@soleen.com, pbonzini@redhat.com,
	peterx@redhat.com, pratyush@kernel.org,
	rick.p.edgecombe@intel.com, rientjes@google.com,
	roman.gushchin@linux.dev, seanjc@google.com,
	shakeel.butt@linux.dev, shivankg@amd.com, vannapurve@google.com,
	yan.y.zhao@intel.com, Dan Williams <djbw@kernel.org>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 6/6] mm: hugetlb: Refactor out hugetlb_alloc_folio()
Date: Tue, 12 May 2026 15:25:30 +0200	[thread overview]
Message-ID: <agMqSlFloJJ22kgB@localhost.localdomain> (raw)
In-Reply-To: <20260506-hugetlb-open-up-v2-6-826a0c5f28fc@google.com>

On Wed, May 06, 2026 at 08:54:42AM -0700, Ackerley Tng via B4 Relay wrote:
> From: Ackerley Tng <ackerleytng@google.com>
> 
> Refactor out hugetlb_alloc_folio() from alloc_hugetlb_folio(), which
> handles allocation of a folio and memory and HugeTLB charging to cgroups.
> 
> This refactoring decouples the HugeTLB page allocation from VMAs,
> specifically:
> 
> 1. Reservations (as in resv_map) are stored in the vma
> 2. mpol is stored at vma->vm_policy
> 3. A vma must be used for allocation even if the pages are not meant to be
>    used by host process.
> 
> Without this coupling, VMAs are no longer a requirement for
> allocation. This opens up the allocation routine for usage without VMAs,
> which will allow guest_memfd to use HugeTLB as a more generic allocator of
> huge pages, since guest_memfd memory may not have any associated VMAs by
> design. In addition, direct allocations from HugeTLB could possibly be
> refactored to avoid the use of a pseudo-VMA.
> 
> Also, this decouples HugeTLB page allocation from HugeTLBfs, where the
> subpool is stored at the fs mount. This is also a requirement for
> guest_memfd, where the plan is to have a subpool created per-fd and stored
> on the inode.
> 
> No functional change intended.
> 
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>

I yet have to review more thoroughly, but I have a comment below:

> ---
>  include/linux/hugetlb.h |   3 +
>  mm/hugetlb.c            | 179 ++++++++++++++++++++++++++----------------------
>  2 files changed, 100 insertions(+), 82 deletions(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 93418625d3c5f..ec205d8580885 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -705,6 +705,9 @@ bool hugetlb_bootmem_page_zones_valid(int nid, struct huge_bootmem_page *m);
>  int isolate_or_dissolve_huge_folio(struct folio *folio, struct list_head *list);
>  int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn);
>  void wait_for_freed_hugetlb_folios(void);
> +struct folio *hugetlb_alloc_folio(struct hstate *h, struct hugepage_subpool *spool,
> +		struct mempolicy *mpol, int nid, nodemask_t *nodemask,
> +		bool charge_hugetlb_cgroup_rsvd, bool use_global_reservation);
>  struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
>  				unsigned long addr, bool cow_from_owner);
>  struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 4159b3565a9be..a1c5b94e52e0a 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2821,6 +2821,88 @@ void wait_for_freed_hugetlb_folios(void)
>  	flush_work(&free_hpage_work);
>  }
>  
> +struct folio *hugetlb_alloc_folio(struct hstate *h, struct hugepage_subpool *spool,
> +		struct mempolicy *mpol, int nid, nodemask_t *nodemask,
> +		bool charge_hugetlb_cgroup_rsvd, bool use_global_reservation)

I think I would put that information into a context struct that we can
pass to hugetlb_alloc_folio, otherwise this seems too overloaded, and
maybe we need to add more params in the future to tweak even more the
allocation. E.g:

 struct hugetlb_alloc_ctxt {
   struct hstate *h;
   struct hugepage_subpool *spool;
   gfp_t gfp_mask;
   ...
 };

Maybe we can go even further and convert those boleans into action flags.

I have the feeling that as is, it is quite ad-hoc code, and the thing is that if
we want to open hugetlb allocations into the world, we should make it as generic as
possible, foreseeing that we do not have to change the API whenever a
new user pops up.

 

-- 
Oscar Salvador
SUSE Labs

  reply	other threads:[~2026-05-12 13:25 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-06 15:54 [PATCH v2 0/6] Open HugeTLB allocation routine for more generic use Ackerley Tng via B4 Relay
2026-05-06 15:54 ` [PATCH v2 1/6] mm: hugetlb: Consolidate interpretation of gbl_chg within alloc_hugetlb_folio() Ackerley Tng via B4 Relay
2026-05-12  9:00   ` Oscar Salvador
2026-05-06 15:54 ` [PATCH v2 2/6] mm: hugetlb: Move mpol interpretation out of alloc_buddy_hugetlb_folio_with_mpol() Ackerley Tng via B4 Relay
2026-05-12 12:51   ` Oscar Salvador
2026-05-06 15:54 ` [PATCH v2 3/6] mm: hugetlb: Move mpol interpretation out of dequeue_hugetlb_folio_vma() Ackerley Tng via B4 Relay
2026-05-12 12:56   ` Oscar Salvador
2026-05-06 15:54 ` [PATCH v2 4/6] mm: hugetlb: Use error variable in alloc_hugetlb_folio Ackerley Tng via B4 Relay
2026-05-06 15:54 ` [PATCH v2 5/6] mm: hugetlb: Move mem_cgroup_charge_hugetlb() earlier in allocation Ackerley Tng via B4 Relay
2026-05-06 15:54 ` [PATCH v2 6/6] mm: hugetlb: Refactor out hugetlb_alloc_folio() Ackerley Tng via B4 Relay
2026-05-12 13:25   ` Oscar Salvador [this message]
2026-05-12 13:17 ` [PATCH v2 0/6] Open HugeTLB allocation routine for more generic use Oscar Salvador

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=agMqSlFloJJ22kgB@localhost.localdomain \
    --to=osalvador@suse.de \
    --cc=ackerleytng@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@kernel.org \
    --cc=djbw@kernel.org \
    --cc=fvdl@google.com \
    --cc=jgg@ziepe.ca \
    --cc=jiaqiyan@google.com \
    --cc=joshua.hahnjy@gmail.com \
    --cc=jthoughton@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=michael.roth@amd.com \
    --cc=muchun.song@linux.dev \
    --cc=pasha.tatashin@soleen.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=pratyush@kernel.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=seanjc@google.com \
    --cc=shakeel.butt@linux.dev \
    --cc=shivankg@amd.com \
    --cc=vannapurve@google.com \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox