linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Jiang <dave.jiang@intel.com>
To: Gregory Price <gourry@gourry.net>, linux-cxl@vger.kernel.org
Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	kernel-team@meta.com, dave@stgolabs.net,
	jonathan.cameron@huawei.com, alison.schofield@intel.com,
	vishal.l.verma@intel.com, ira.weiny@intel.com,
	dan.j.williams@intel.com, corbet@lwn.net
Subject: Re: [PATCH v3 08/17] cxl: docs/linux - early boot configuration
Date: Tue, 13 May 2025 10:56:45 -0700	[thread overview]
Message-ID: <c940d96a-d021-44e1-85e9-362ae4dd8d74@intel.com> (raw)
In-Reply-To: <20250512162134.3596150-9-gourry@gourry.net>



On 5/12/25 9:21 AM, Gregory Price wrote:
> Document __init time configurations that affect CXL driver probe
> process and memory region configuration.
> 
> Signed-off-by: Gregory Price <gourry@gourry.net>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
>  Documentation/driver-api/cxl/index.rst        |   1 +
>  .../driver-api/cxl/linux/early-boot.rst       | 131 ++++++++++++++++++
>  2 files changed, 132 insertions(+)
>  create mode 100644 Documentation/driver-api/cxl/linux/early-boot.rst
> 
> diff --git a/Documentation/driver-api/cxl/index.rst b/Documentation/driver-api/cxl/index.rst
> index bc2228c77c32..d2eefe575604 100644
> --- a/Documentation/driver-api/cxl/index.rst
> +++ b/Documentation/driver-api/cxl/index.rst
> @@ -34,6 +34,7 @@ that have impacts on each other.  The docs here break up configurations steps.
>     :caption: Linux Kernel Configuration
>  
>     linux/overview
> +   linux/early-boot
>     linux/access-coordinates
>  
>  
> diff --git a/Documentation/driver-api/cxl/linux/early-boot.rst b/Documentation/driver-api/cxl/linux/early-boot.rst
> new file mode 100644
> index 000000000000..8c1c497bc772
> --- /dev/null
> +++ b/Documentation/driver-api/cxl/linux/early-boot.rst
> @@ -0,0 +1,131 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +=======================
> +Linux Init (Early Boot)
> +=======================
> +
> +Linux configuration is split into two major steps: Early-Boot and everything else.
> +
> +During early boot, Linux sets up immutable resources (such as numa nodes), while
> +later operations include things like driver probe and memory hotplug.  Linux may
> +read EFI and ACPI information throughout this process to configure logical
> +representations of the devices.
> +
> +During Linux Early Boot stage (functions in the kernel that have the __init
> +decorator), the system takes the resources created by EFI/BIOS (ACPI tables)
> +and turns them into resources that the kernel can consume.
> +
> +
> +BIOS, Build and Boot Options
> +============================
> +
> +There are 4 pre-boot options that need to be considered during kernel build
> +which dictate how memory will be managed by Linux during early boot.
> +
> +* EFI_MEMORY_SP
> +
> +  * BIOS/EFI Option that dictates whether memory is SystemRAM or
> +    Specific Purpose.  Specific Purpose memory will be deferred to
> +    drivers to manage - and not immediately exposed as system RAM.
> +
> +* CONFIG_EFI_SOFT_RESERVE
> +
> +  * Linux Build config option that dictates whether the kernel supports
> +    Specific Purpose memory.
> +
> +* CONFIG_MHP_DEFAULT_ONLINE_TYPE
> +
> +  * Linux Build config that dictates whether and how Specific Purpose memory
> +    converted to a dax device should be managed (left as DAX or onlined as
> +    SystemRAM in ZONE_NORMAL or ZONE_MOVABLE).
> +
> +* nosoftreserve
> +
> +  * Linux kernel boot option that dictates whether Soft Reserve should be
> +    supported.  Similar to CONFIG_EFI_SOFT_RESERVE.
> +
> +Memory Map Creation
> +===================
> +
> +While the kernel parses the EFI memory map, if :code:`Specific Purpose` memory
> +is supported and detected, it will set this region aside as
> +:code:`SOFT_RESERVED`.
> +
> +If :code:`EFI_MEMORY_SP=0`, :code:`CONFIG_EFI_SOFT_RESERVE=n`, or
> +:code:`nosoftreserve=y` - Linux will default a CXL device memory region to
> +SystemRAM.  This will expose the memory to the kernel page allocator in
> +:code:`ZONE_NORMAL`, making it available for use for most allocations (including
> +:code:`struct page` and page tables).
> +
> +If `Specific Purpose` is set and supported, :code:`CONFIG_MHP_DEFAULT_ONLINE_TYPE_*`
> +dictates whether the memory is onlined by default (:code:`_OFFLINE` or
> +:code:`_ONLINE_*`), and if online which zone to online this memory to by default
> +(:code:`_NORMAL` or :code:`_MOVABLE`).
> +
> +If placed in :code:`ZONE_MOVABLE`, the memory will not be available for most
> +kernel allocations (such as :code:`struct page` or page tables).  This may
> +significant impact performance depending on the memory capacity of the system.
> +
> +
> +NUMA Node Reservation
> +=====================
> +
> +Linux refers to the proximity domains (:code:`PXM`) defined in the SRAT to
> +create NUMA nodes in :code:`acpi_numa_init`. Typically, there is a 1:1 relation
> +between :code:`PXM` and NUMA node IDs.
> +
> +SRAT is the only ACPI defined way of defining Proximity Domains. Linux chooses
> +to, at most, map those 1:1 with NUMA nodes. CEDT adds a description of SPA
> +ranges which Linux may wish to map to one or more NUMA nodes.
> +
> +If there are CXL ranges in the CFMWS but not in SRAT, then a fake :code:`PXM`
> +is created (as of v6.15). In the future, Linux may reject CFMWS not described
> +by SRAT due to the ambiguity of proximity domain association.
> +
> +It is important to note that NUMA node creation cannot be done at runtime. All
> +possible NUMA nodes are identified at :code:`__init` time, more specifically
> +during :code:`mm_init`. The CEDT and SRAT must contain sufficient :code:`PXM`
> +data for Linux to identify NUMA nodes their associated memory regions.
> +
> +The relevant code exists in: :code:`linux/drivers/acpi/numa/srat.c`.
> +
> +See the Example Platform Configurations section for more information.
> +
> +Memory Tiers Creation
> +=====================
> +Memory tiers are a collection of NUMA nodes grouped by performance characteristics.
> +During :code:`__init`, Linux initializes the system with a default memory tier that
> +contains all nodes marked :code:`N_MEMORY`.
> +
> +:code:`memory_tier_init` is called at boot for all nodes with memory online by
> +default. :code:`memory_tier_late_init` is called during late-init for nodes setup
> +during driver configuration.
> +
> +Nodes are only marked :code:`N_MEMORY` if they have *online* memory.
> +
> +Tier membership can be inspected in ::
> +
> +  /sys/devices/virtual/memory_tiering/memory_tierN/nodelist
> +  0-1
> +
> +If nodes are grouped which have clear difference in performance, check the HMAT
> +and CDAT information for the CXL nodes.  All nodes default to the DRAM tier,
> +unless HMAT/CDAT information is reported to the memory_tier component via
> +`access_coordinates`.
> +
> +Contiguous Memory Allocation
> +============================
> +The contiguous memory allocator (CMA) enables reservation of contiguous memory
> +regions on NUMA nodes during early boot.  However, CMA cannot reserve memory
> +on NUMA nodes that are not online during early boot. ::
> +
> +  void __init hugetlb_cma_reserve(int order) {
> +    if (!node_online(nid))
> +      /* do not allow reservations */
> +  }
> +
> +This means if users intend to defer management of CXL memory to the driver, CMA
> +cannot be used to guarantee huge page allocations.  If enabling CXL memory as
> +SystemRAM in `ZONE_NORMAL` during early boot, CMA reservations per-node can be
> +made with the :code:`cma_pernuma` or :code:`numa_cma` kernel command line
> +parameters.


  reply	other threads:[~2025-05-13 17:56 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-12 16:21 [PATCH v3 00/17] CXL Boot to Bash Documentation Gregory Price
2025-05-12 16:21 ` [PATCH v3 01/17] cxl: update documentation structure in prep for new docs Gregory Price
2025-05-12 22:46   ` Dave Jiang
2025-05-12 16:21 ` [PATCH v3 02/17] cxl: docs - access-coordinates doc fixups Gregory Price
2025-05-12 22:47   ` Dave Jiang
2025-05-12 16:21 ` [PATCH v3 03/17] cxl: docs/devices - add cxl device and protocol reference Gregory Price
2025-05-12 23:08   ` Dave Jiang
2025-05-12 23:22     ` Gregory Price
2025-05-12 16:21 ` [PATCH v3 04/17] cxl: docs/platform/bios-and-efi documentation Gregory Price
2025-05-12 23:31   ` Dave Jiang
2025-05-12 16:21 ` [PATCH v3 05/17] cxl: docs/platform/acpi reference documentation Gregory Price
2025-05-12 23:49   ` Dave Jiang
2025-05-12 16:21 ` [PATCH v3 06/17] cxl: docs/platform/example-configs documentation Gregory Price
2025-05-13  0:05   ` Dave Jiang
2025-05-12 16:21 ` [PATCH v3 07/17] cxl: docs/linux - overview Gregory Price
2025-05-13  0:09   ` Dave Jiang
2025-05-12 16:21 ` [PATCH v3 08/17] cxl: docs/linux - early boot configuration Gregory Price
2025-05-13 17:56   ` Dave Jiang [this message]
2025-05-12 16:21 ` [PATCH v3 09/17] cxl: docs/linux - add cxl-driver theory of operation Gregory Price
2025-05-12 16:21 ` [PATCH v3 10/17] cxl: docs/linux/cxl-driver - add example configurations Gregory Price
2025-05-12 16:21 ` [PATCH v3 11/17] cxl: docs/linux/dax-driver documentation Gregory Price
2025-05-12 16:21 ` [PATCH v3 12/17] cxl: docs/linux/memory-hotplug Gregory Price
2025-05-12 16:21 ` [PATCH v3 13/17] cxl: docs/allocation/dax Gregory Price
2025-05-12 16:21 ` [PATCH v3 14/17] cxl: docs/allocation/page-allocator Gregory Price
2025-05-12 16:34   ` Matthew Wilcox
2025-05-12 16:38     ` Gregory Price
2025-05-12 17:52       ` Matthew Wilcox
2025-05-12 18:09         ` Gregory Price
2025-05-13  2:39           ` dan.j.williams
2025-05-12 16:21 ` [PATCH v3 15/17] cxl: docs/allocation/reclaim Gregory Price
2025-05-12 16:21 ` [PATCH v3 16/17] cxl: docs/allocation/hugepages Gregory Price
2025-05-12 16:21 ` [PATCH v3 17/17] cxl: docs - add self-referencing cross-links Gregory Price
2025-05-13 20:38 ` [PATCH v3 00/17] CXL Boot to Bash Documentation Dave Jiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c940d96a-d021-44e1-85e9-362ae4dd8d74@intel.com \
    --to=dave.jiang@intel.com \
    --cc=alison.schofield@intel.com \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=dave@stgolabs.net \
    --cc=gourry@gourry.net \
    --cc=ira.weiny@intel.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=kernel-team@meta.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).