Re: [RFC PATCH 00/40] mm: reliable 1GB page allocation

Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Usama Arif <usama.arif@linux.dev>
To: Rik van Riel <riel@surriel.com>
Cc: Usama Arif <usama.arif@linux.dev>,
	linux-kernel@vger.kernel.org, kernel-team@meta.com,
	linux-mm@kvack.org, david@kernel.org, willy@infradead.org,
	surenb@google.com, hannes@cmpxchg.org, ljs@kernel.org,
	ziy@nvidia.com, fvdl@google.com
Subject: Re: [RFC PATCH 00/40] mm: reliable 1GB page allocation
Date: Fri, 22 May 2026 04:02:55 -0700	[thread overview]
Message-ID: <20260522110257.1640781-1-usama.arif@linux.dev> (raw)
In-Reply-To: <20260520150018.2491267-1-riel@surriel.com>

On Wed, 20 May 2026 10:59:06 -0400 Rik van Riel <riel@surriel.com> wrote:

> 
> Some workloads see real performance benefits from using 1GB pages,
> but allocating 1GB pages has often been limited to hugetlb pages
> that were set aside at boot time, or using CMA to keep a fixed
> amount of system memory off limits to the kernel.
> 
> Neither of those are great solutions, given that modern servers
> tend to be large, often run multiple workloads simultaneously,
> and each workload wants something else.
> 
> To address that issue, this patch series divides memory not just
> into 2MB page blocks, but into PUD sized superpageblocks, and
> aggressively tries to steer unmovable, reclaimable, and highatomic
> allocations into those superpageblocks that have already been
> "tainted" by such allocations.
> 
> The goal is to leave as many 1GB superpageblocks as possible
> used by only movable allocations, so they can be easily
> defragmented for either regular PMD sized huge pages, or
> for PUD sized huge pages.
> 
> Various strategies are used to accomplish this goal:
> - unmovable and reclaimable allocations are preferentially
>   done from 1GB blocks that have already been "tainted" by
>   these allocations
> - kernel allocations that can be done as one higher order
>   allocation, or a number of smaller allocations (eg. kvmalloc)
>   will fall back to small pages, rather than taint a new
>   1GB block

Hi Rik!

The comments are just based on coverletter.

Hopefully will get to review all the patches. The above one of
kernel allocations falling back to small pages is interesting.

- Will it result in a performance impact as kernel allocations
wont benefit from higher order allocation?
- Will this impact 2M THP allocation efficiency due to more
fragmentation of kernel memory?


> - movable allocations are preferentially done from clean 1GB
>   blocks, which have only free and movable memory inside,
>   starting with the fullest of these 1GB blocks
> - 2MB allocations follow the same strategy
> - 1GB allocations start with the emptiest clean 1GB block
> - if a 1GB block is mixed, with some movable pageblocks,
>   some free pageblocks, and some unmovable/reclaimable pageblocks,
>   the system has a free threshold below which only unmovable and
>   reclaimable allocations can be done from that 1GB block
> - below that threshold, no new movable allocations are allowed
>   in that 1GB block, while new unmovable/reclaimable allocations
>   are still allowed

by allowed, do you mean if movable allocations fail, it will
result in OOM?


> - when a 1GB block is below that threshold, use the migration
>   code to evacuate enough movable memory from the 1GB block
>   to bring free memory in that 1GB block back to the threshold
> 
> These strategies together serve to concentrate unmovable and
> reclaimable allocations in as few 1GB blocks as possible,
> leaving as many 1GB blocks as possible available for movable
> allocations.
> 
> That enables both more extensive use of 2MB THPs and mTHPs,
> as well as reliable allocation of 1GB pages.
> 
> The above strategies also make the core page allocator
> more complicated, and slower. In order to avoid that issue,
> the series is built on top of Johannes's PCPBuddy series,
> which has the goal of reducing how often CPUs need to get
> pages from the zone free lists, instead relying on CPUs
> giving back pages to each other, based on page block ownership.
> 
> TODO:
> - compaction "always" succeeds, with a success rate of 99.96% seen
>   in traces; this sounds great, but it also results in compaction
>   never being throttled, and compaction blowing out everybody's
>   PCP through lru_add_drain() calls. This needs some sort of solution.
> - replace the superpageblock name with something Matthew and David
>   both like
> - find more corner cases, and fix them
> 
> Based on e1914add2799
> 
> 
>

next prev parent reply	other threads:[~2026-05-22 11:03 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-20 14:59 [RFC PATCH 00/40] mm: reliable 1GB page allocation Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 01/40] mm: page_alloc: replace pageblock_flags bitmap with struct pageblock_data Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 02/40] mm: page_alloc: per-cpu pageblock buddy allocator Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 03/40] mm: page_alloc: split-path PCP free with local-trylock + remote-llist Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 04/40] mm: mm_init: fix zone assignment for pages in unavailable ranges Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 05/40] mm: page_alloc: remove watermark boost mechanism Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 06/40] mm: page_alloc: async evacuation of stolen movable pageblocks Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 07/40] mm: page_alloc: track actual page contents in pageblock flags Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 08/40] mm: page_alloc: superpageblock metadata for 1GB anti-fragmentation Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 09/40] mm: page_alloc: support superpageblock resize for memory hotplug Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 10/40] mm: page_alloc: add superpageblock fullness lists for allocation steering Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 11/40] mm: page_alloc: steer pageblock stealing to tainted superpageblocks Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 12/40] mm: page_alloc: steer movable allocations to fullest clean superpageblocks Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 13/40] mm: page_alloc: extract claim_whole_block from try_to_claim_block Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 14/40] mm: page_alloc: add per-superpageblock free lists Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 15/40] mm: page_alloc: add background superpageblock defragmentation worker Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 16/40] mm: compaction: walk per-superpageblock free lists for migration targets Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 17/40] mm: page_alloc: superpageblock-aware contiguous and higher order allocation Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 18/40] mm: page_alloc: prevent atomic allocations from tainting clean SPBs Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 19/40] mm: page_alloc: aggressively pack non-movable allocs in tainted SPBs on large systems Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 20/40] mm: page_alloc: prefer reclaim over tainting clean superpageblocks Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 21/40] mm: page_alloc: adopt partial pageblocks from tainted superpageblocks Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 22/40] mm: page_alloc: add CONFIG_DEBUG_VM sanity checks for SPB counters Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 23/40] mm: page_alloc: targeted evacuation and dynamic reserves for tainted SPBs Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 24/40] mm: page_alloc: prevent UNMOVABLE/RECLAIMABLE mixing in pageblocks Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 25/40] mm: trigger deferred SPB evac when atomic allocs would taint a clean SPB Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 26/40] mm: page_alloc: refuse fragmenting fallback for callers with cheap fallback Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 27/40] mm: page_alloc: cross-migratetype buddy borrow within tainted SPBs Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 28/40] mm: page_alloc: drive slab shrink from SPB anti-fragmentation pressure Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 29/40] mm: page_reporting: walk per-superpageblock free lists Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 30/40] mm: show_mem: collect migratetype letters from per-superpageblock lists Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 31/40] mm: page_alloc: per-(zone, order, mt) PASS_1 hint cache Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 32/40] mm: debug: prevent infinite recursion in dump_page() with CMA Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 33/40] PM: hibernate: walk per-superpageblock free lists in mark_free_pages Rik van Riel
2026-05-20 18:19   ` Rafael J. Wysocki
2026-05-20 14:59 ` [RFC PATCH 34/40] btrfs: allocate eb-attached btree pages as movable Rik van Riel
2026-05-20 17:47   ` Boris Burkov
2026-05-23 15:58     ` David Sterba
2026-05-24  1:43       ` Rik van Riel
2026-05-24 19:59         ` Matthew Wilcox
2026-05-25  6:57           ` Christoph Hellwig
2026-05-20 14:59 ` [RFC PATCH 35/40] mm: page_alloc: refuse best-effort high-order allocs servable at lower orders Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 36/40] mm: page_alloc: set ALLOC_NOFRAGMENT on alloc_frozen_pages_nolock_noprof Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 37/40] mm: page_alloc: move spb_get_category and spb_tainted_reserve to mmzone.h Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 38/40] mm: compaction: skip empty tainted superpageblocks as migration source Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 39/40] mm: compaction: respect tainted SPB reserve in destination selection Rik van Riel
2026-05-20 14:59 ` [RFC PATCH 40/40] mm: page_alloc: SPB tracepoint instrumentation [DO-NOT-MERGE] Rik van Riel
2026-05-21  7:39 ` [syzbot ci] Re: mm: reliable 1GB page allocation syzbot ci
2026-05-22 11:02 ` Usama Arif [this message]
2026-05-22 13:55   ` [RFC PATCH 00/40] " Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260522110257.1640781-1-usama.arif@linux.dev \
    --to=usama.arif@linux.dev \
    --cc=david@kernel.org \
    --cc=fvdl@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=riel@surriel.com \
    --cc=surenb@google.com \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox