Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Pratyush Yadav <pratyush@kernel.org>
To: Mike Rapoport <rppt@kernel.org>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	Pratyush Yadav <pratyush@kernel.org>,
	Alexander Graf <graf@amazon.com>,
	Muchun Song <muchun.song@linux.dev>,
	Oscar Salvador <osalvador@suse.de>,
	David Hildenbrand <david@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jason Miu <jasonmiu@google.com>,
	Jork Loeser <jloeser@linux.microsoft.com>
Cc: kexec@lists.infradead.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH v2 00/18] kho: make boot time huge page allocation work nicely with KHO
Date: Fri,  5 Jun 2026 20:34:33 +0200	[thread overview]
Message-ID: <20260605183501.3884950-1-pratyush@kernel.org> (raw)

From: "Pratyush Yadav (Google)" <pratyush@kernel.org>

Hi,

Gigantic huge page allocation is somewhat broken currently with KHO.

First, they break scratch size accounting. Since they are allocated
using the memblock alloc APIs, they count towards RSRV_KERN, and this
scratch size when using scratch_scale. This means if huge pages take a
large enough chunk of system memory scratch size will blow up and fail
to allocate.

Second, scratch can not contain preserved memory, and if hugepages are
allocated from scratch, they will fail to be preserved with the upcoming
hugetlb preservation series [0].

Fix this by introducing the concept of extended scratch areas. They are
areas that the kernel discovers on boot by walking the radix tree and
finding free memory ranges. See patch 10 for more details.

Discovering the scratch areas needs some preparatory changes to KHO, the
radix tree APIs, and to memblock. Patches 1-14 do that.

Patch 15 adds the scratch discovery logic.

Patch 16 adds the dedicated memblock hugetlb allocator.

Patch 17-18 fix the scratch size calculation with using scratch_scale.

[0] https://lore.kernel.org/linux-mm/20251206230222.853493-1-pratyush@kernel.org/T/#u

Changes in v2:

Detailed changelog below.

At a high level, the major change in this version is to remove
MEMBLOCK_KHO_SCRATCH_EXT. Keep MEMBLOCK_KHO_SCRATCH as the only memory
type and mark the discovered areas with it. For HugeTLB, add a dedicated
allocation routine and if allocated memory lands in scratch, do a retry.
Also introduce MEMBLOCK_RSRV_HUGETLB to help with accounting of scratch
area sizes.

- Fixup commit message in patch 1 to make namespacing change clearer.
- Use @key in kernel-doc for radix functions.
- Add a runtime check on key width.
- Move all mem retrieval logic to kho_mem_retrieve().
- Add a comment in kho_mem_retrieve() explaining why mem_map won't be NULL.
- Rename callbacks to ->leaf() and ->node().
- Fixup commit messages.
- Clear tree->root in kho_radix_destroy_tree(). This lets the tree be
  re-initialized by calling kho_radix_init_tree()
- Add kho_get_mem_map() earlier in the series.
- Export kho_scratch_overlap() and use it in memblock_is_kho_scratch_memory().
- Get rid of MEMBLOCK_KHO_SCRATCH_EXT.
- Introduce MEMBLOCK_RSRV_HUGETLB.
- Introduce memblock_alloc_hugetlb() for hugetlb bootmem allocations.
- Refactor memblock_reserved_kern_size() to allow calculating size by flags.
- Exclude hugetlb memory from scratch size calculation.
- Collect R-bys.

Regards,
Pratyush Yadav

Pratyush Yadav (Google) (18):
  kho: generalize radix tree APIs
  kho: disallow wide keys in radix tree
  kho: return virtual address of mem_map
  kho: store incoming radix tree in kho_in
  kho: move all memory retrieval logic to kho_mem_retrieve()
  kho: add a struct for radix callbacks
  kho: add callback for table pages
  kho: add data argument to radix walk callback
  kho: allow early-boot usage of the KHO radix tree
  kho: allow destroying KHO radix tree
  kho: add kho_radix_init_tree()
  kho: export kho_scratch_overlap()
  kho: initialize kho_scratch pointer earlier in boot
  memblock: use kho_scratch_overlap() to decide migratetype
  kho: extend scratch
  memblock: make HugeTLB bootmem allocation work with KHO
  memblock: allow calculating reserved size by flags
  kho: exclude hugetlb memory from scratch size calculation

 include/linux/kexec_handover.h              |  10 +
 include/linux/kho/abi/kexec_handover.h      |   8 +
 include/linux/kho_radix_tree.h              |  44 +-
 include/linux/memblock.h                    |   9 +-
 kernel/liveupdate/Makefile                  |   1 -
 kernel/liveupdate/kexec_handover.c          | 495 +++++++++++++++-----
 kernel/liveupdate/kexec_handover_debug.c    |  25 -
 kernel/liveupdate/kexec_handover_internal.h |   9 -
 mm/hugetlb.c                                |  22 +-
 mm/memblock.c                               | 120 ++++-
 mm/mm_init.c                                |   1 +
 11 files changed, 540 insertions(+), 204 deletions(-)
 delete mode 100644 kernel/liveupdate/kexec_handover_debug.c


base-commit: 2935777b418d2bfcbfe96705bb2c0fa6c0d94e18
-- 
2.54.0.1032.g2f8565e1d1-goog



             reply	other threads:[~2026-06-05 18:35 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-05 18:34 Pratyush Yadav [this message]
2026-06-05 18:34 ` [PATCH v2 01/18] kho: generalize radix tree APIs Pratyush Yadav
2026-06-05 18:34 ` [PATCH v2 02/18] kho: disallow wide keys in radix tree Pratyush Yadav
2026-06-05 22:06   ` Jork Loeser
2026-06-08  9:10     ` Pratyush Yadav
2026-06-05 18:34 ` [PATCH v2 03/18] kho: return virtual address of mem_map Pratyush Yadav
2026-06-05 18:34 ` [PATCH v2 04/18] kho: store incoming radix tree in kho_in Pratyush Yadav
2026-06-05 18:34 ` [PATCH v2 05/18] kho: move all memory retrieval logic to kho_mem_retrieve() Pratyush Yadav
2026-06-05 18:34 ` [PATCH v2 06/18] kho: add a struct for radix callbacks Pratyush Yadav
2026-06-05 18:34 ` [PATCH v2 07/18] kho: add callback for table pages Pratyush Yadav
2026-06-05 18:34 ` [PATCH v2 08/18] kho: add data argument to radix walk callback Pratyush Yadav
2026-06-05 18:34 ` [PATCH v2 09/18] kho: allow early-boot usage of the KHO radix tree Pratyush Yadav
2026-06-05 18:34 ` [PATCH v2 10/18] kho: allow destroying " Pratyush Yadav
2026-06-05 18:34 ` [PATCH v2 11/18] kho: add kho_radix_init_tree() Pratyush Yadav
2026-06-05 18:34 ` [PATCH v2 12/18] kho: export kho_scratch_overlap() Pratyush Yadav
2026-06-05 18:34 ` [PATCH v2 13/18] kho: initialize kho_scratch pointer earlier in boot Pratyush Yadav
2026-06-05 18:34 ` [PATCH v2 14/18] memblock: use kho_scratch_overlap() to decide migratetype Pratyush Yadav
2026-06-05 18:34 ` [PATCH v2 15/18] kho: extend scratch Pratyush Yadav
2026-06-05 18:34 ` [PATCH v2 16/18] memblock: make HugeTLB bootmem allocation work with KHO Pratyush Yadav
2026-06-05 18:34 ` [PATCH v2 17/18] memblock: allow calculating reserved size by flags Pratyush Yadav
2026-06-05 18:34 ` [PATCH v2 18/18] kho: exclude hugetlb memory from scratch size calculation Pratyush Yadav

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260605183501.3884950-1-pratyush@kernel.org \
    --to=pratyush@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=david@kernel.org \
    --cc=graf@amazon.com \
    --cc=jasonmiu@google.com \
    --cc=jloeser@linux.microsoft.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=muchun.song@linux.dev \
    --cc=osalvador@suse.de \
    --cc=pasha.tatashin@soleen.com \
    --cc=rppt@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox