From: Kit Dallege <xaum.io@gmail.com>
To: akpm@linux-foundation.org, david@kernel.org, corbet@lwn.net
Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org,
Kit Dallege <xaum.io@gmail.com>
Subject: [PATCH] Docs/mm: document Boot Memory
Date: Sat, 14 Mar 2026 16:25:27 +0100 [thread overview]
Message-ID: <20260314152527.100295-1-xaum.io@gmail.com> (raw)
Fill in the bootmem.rst stub created in commit 481cc97349d6
("mm,doc: Add new documentation structure") as part of
the structured memory management documentation following
Mel Gorman's book outline.
Signed-off-by: Kit Dallege <xaum.io@gmail.com>
---
Documentation/mm/bootmem.rst | 139 +++++++++++++++++++++++++++++++++++
1 file changed, 139 insertions(+)
diff --git a/Documentation/mm/bootmem.rst b/Documentation/mm/bootmem.rst
index eb2b31eedfa1..b20520f53603 100644
--- a/Documentation/mm/bootmem.rst
+++ b/Documentation/mm/bootmem.rst
@@ -3,3 +3,142 @@
===========
Boot Memory
===========
+
+The kernel needs a memory allocator long before the page allocator is ready.
+The memblock allocator fills this role, managing physical memory from the
+earliest stages of boot until the buddy allocator takes over. The
+implementation is in ``mm/memblock.c`` and ``mm/mm_init.c``.
+
+.. contents:: :local:
+
+Memblock
+========
+
+Memblock tracks physical memory as two arrays of regions: ``memory`` (all
+usable RAM reported by firmware) and ``reserved`` (memory already allocated
+or otherwise unavailable). A free page is one that appears in ``memory``
+but not in ``reserved``. These two arrays, along with global state such as
+the allocation direction and address limit, are held in a single
+``struct memblock`` instance.
+
+Each region is a ``struct memblock_region`` recording a base address, size,
+NUMA node ID, and a set of flags:
+
+- **HOTPLUG**: memory that may be physically removed at runtime.
+- **MIRROR**: memory with hardware mirroring for reliability.
+- **NOMAP**: memory that should not be directly mapped by the kernel
+ (e.g., firmware-reserved ranges that are usable but not mappable).
+- **DRIVER_MANAGED**: memory whose lifecycle is managed by a device driver.
+
+Region Management
+-----------------
+
+Firmware and architecture code populate the arrays early in boot.
+``memblock_add()`` registers a range of usable RAM. ``memblock_reserve()``
+marks a range as taken — this is used for the kernel image itself, device
+tree blobs, initrd, and other early allocations.
+
+When regions are added, overlapping ranges are merged automatically.
+Internally, ``memblock_add_range()`` handles insertion, overlap detection,
+and merging in a single pass. If the region array is full, it is doubled
+in size — using memblock itself to allocate the new array.
+
+``memblock_remove()`` deletes a range from the ``memory`` array (used when
+firmware reports memory that turns out to be unusable).
+``memblock_phys_free()`` removes a range from ``reserved``, making it
+available for allocation again.
+
+Allocation
+----------
+
+Memblock allocation scans the ``memory`` array for a range that does not
+overlap ``reserved``, respecting NUMA node affinity and a configurable
+address limit (``memblock.current_limit``).
+
+The search can run in two directions:
+
+- **Top-down** (default): allocates from the highest available address.
+ This keeps low memory free for devices with addressing limitations.
+- **Bottom-up**: allocates from the lowest available address. Used on
+ some architectures during early boot to keep allocations predictable.
+
+Once a suitable range is found it is added to ``reserved``. The main
+allocation functions are ``memblock_alloc()`` for virtual addresses and
+``memblock_phys_alloc()`` for physical addresses. Both support NUMA-aware
+variants that prefer a specific node.
+
+Iteration
+---------
+
+Memblock provides iterator macros for walking memory ranges:
+
+- ``for_each_mem_range()`` iterates over free ranges (memory minus
+ reserved).
+- ``for_each_reserved_mem_region()`` iterates over reserved ranges.
+- ``for_each_mem_pfn_range()`` iterates by page frame number, which is
+ used heavily during page and zone initialization.
+
+These iterators handle the subtraction of reserved regions from memory
+regions internally, presenting the caller with a simple sequence of
+available ranges.
+
+Transition to the Page Allocator
+================================
+
+Once the buddy allocator is initialized, memblock releases its free pages
+via ``memblock_free_all()``. This walks all free ranges and hands each
+page to the buddy allocator. After this point memblock is no longer used
+for allocation and its data structures can be freed (on systems that
+support it, the memblock arrays themselves are returned to the page
+allocator via ``memblock_discard()``).
+
+Named Reservations
+------------------
+
+The ``reserve_mem`` kernel command line parameter allows firmware or boot
+loaders to reserve named memory regions that persist across kexec. These
+are tracked separately and can be looked up by name at runtime with
+``reserve_mem_find_by_name()``.
+
+Page and Zone Initialization
+============================
+
+``mm/mm_init.c`` bridges memblock and the page allocator. Its primary
+responsibilities are determining zone boundaries and initializing
+``struct page`` for every physical page frame.
+
+Zone Topology
+-------------
+
+The function ``free_area_init()`` is called by architecture code to set up
+nodes and zones. It calculates zone boundaries based on architectural
+constraints (which address ranges can be used for DMA, which are always
+mapped, etc.) and kernel command line parameters:
+
+- ``kernelcore=`` sets the amount of memory that must be in non-movable
+ zones.
+- ``movablecore=`` sets the amount of memory to place in ``ZONE_MOVABLE``.
+- ``movable_node`` allows entire NUMA nodes to be treated as movable.
+- ``kernelcore=mirror`` restricts non-movable memory to mirrored regions.
+
+These parameters control the boundary between ``ZONE_MOVABLE`` and the
+other zones, which in turn affects how much memory is available for
+transparent huge pages, memory hot-remove, and CMA.
+
+Struct Page Initialization
+--------------------------
+
+Every physical page frame needs an initialized ``struct page`` before the
+page allocator can manage it. On small systems this is done synchronously
+during boot. On large systems with hundreds of gigabytes of RAM, this
+initialization can take a significant amount of time.
+
+With ``CONFIG_DEFERRED_STRUCT_PAGE_INIT``, only pages in the boot node's
+lower zones are initialized during early boot — enough to get the system
+running. The remaining pages are initialized in parallel by worker threads
+(via the padata framework) before they are first needed. This can save
+several seconds of boot time on large NUMA systems.
+
+Each page is initialized by setting its flags, reference count, and links
+to the owning node and zone. Pages in memory holes or ``NOMAP`` regions
+are marked as reserved and are never handed to the page allocator.
--
2.53.0
next reply other threads:[~2026-03-14 15:25 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-14 15:25 Kit Dallege [this message]
2026-03-15 20:43 ` [PATCH] Docs/mm: document Boot Memory Lorenzo Stoakes (Oracle)
2026-03-16 6:47 ` Mike Rapoport
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260314152527.100295-1-xaum.io@gmail.com \
--to=xaum.io@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=david@kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox