From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Dan Williams <dan.j.williams@intel.com>,
<rafael.j.wysocki@intel.com>, <linux-cxl@vger.kernel.org>
Cc: Alison Schofield <alison.schofield@intel.com>,
<linux-acpi@vger.kernel.org>
Subject: Re: [PATCH 6/6] ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT
Date: Thu, 18 Nov 2021 13:12:18 +0000 [thread overview]
Message-ID: <20211118131218.00005645@Huawei.com> (raw)
In-Reply-To: <163553711933.2509508.2203471175679990.stgit@dwillia2-desk3.amr.corp.intel.com>
On Fri, 29 Oct 2021 12:51:59 -0700
Dan Williams <dan.j.williams@intel.com> wrote:
> From: Alison Schofield <alison.schofield@intel.com>
>
> During NUMA init, CXL memory defined in the SRAT Memory Affinity
> subtable may be assigned to a NUMA node. Since there is no
> requirement that the SRAT be comprehensive for CXL memory another
> mechanism is needed to assign NUMA nodes to CXL memory not identified
> in the SRAT.
>
> Use the CXL Fixed Memory Window Structure (CFMWS) of the ACPI CXL
> Early Discovery Table (CEDT) to find all CXL memory ranges.
> Create a NUMA node for each CFMWS that is not already assigned to
> a NUMA node. Add a memblk attaching its host physical address
> range to the node.
>
> Note that these ranges may not actually map any memory at boot time.
> They may describe persistent capacity or may be present to enable
> hot-plug.
>
> Consumers can use phys_to_target_node() to discover the NUMA node.
>
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Hi,
I was just discussing this with one of our firmware / ACPI experts and he asked
an interesting question. If you want to have CFMWS regions correspond to
new NUMA nodes, why not put them in SRAT as hotpluggable memory, but have none
present in the memory map (whichever route you use to get that)?
We do this for normal memory hotplug as (via the other discussion on qemu virtio-mem
nodes) apparently does qemu.
https://lore.kernel.org/all/655c65af-fd7a-8007-37b3-a56c60a0ec5b@redhat.com/
This doesn't solve the question of whether we have enough nodes, but it's
not worse than if we use CFMWS regions and fits within existing ACPI spec.
The only reason I can immediately think of to not do this, is that it might be
a pain to later change over to dynamic numa node allocation in a fashion that
then ignores SRAT entries. Probably a solvable problem.
Jonathan
> ---
> drivers/acpi/numa/srat.c | 59 +++++++++++++++++++++++++++++++++++++++++++++-
> drivers/cxl/acpi.c | 3 ++
> 2 files changed, 60 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
> index b8795fc49097..66a0142dc78c 100644
> --- a/drivers/acpi/numa/srat.c
> +++ b/drivers/acpi/numa/srat.c
> @@ -298,6 +298,47 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
> out_err:
> return -EINVAL;
> }
> +
> +static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
> + void *arg, const unsigned long table_end)
> +{
> + struct acpi_cedt_cfmws *cfmws;
> + int *fake_pxm = arg;
> + u64 start, end;
> + int node;
> +
> + cfmws = (struct acpi_cedt_cfmws *)header;
> + start = cfmws->base_hpa;
> + end = cfmws->base_hpa + cfmws->window_size;
> +
> + /* Skip if the SRAT already described the NUMA details for this HPA */
> + node = phys_to_target_node(start);
> + if (node != NUMA_NO_NODE)
> + return 0;
> +
> + node = acpi_map_pxm_to_node(*fake_pxm);
> +
> + if (node == NUMA_NO_NODE) {
> + pr_err("ACPI NUMA: Too many proximity domains while processing CFMWS.\n");
> + return -EINVAL;
> + }
> +
> + if (numa_add_memblk(node, start, end) < 0) {
> + /* CXL driver must handle the NUMA_NO_NODE case */
> + pr_warn("ACPI NUMA: Failed to add memblk for CFMWS node %d [mem %#llx-%#llx]\n",
> + node, start, end);
> + }
> +
> + /* Set the next available fake_pxm value */
> + (*fake_pxm)++;
> + return 0;
> +}
> +#else
> +static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
> + void *arg, const unsigned long table_end)
> +{
> + return 0;
> +}
> #endif /* defined(CONFIG_X86) || defined (CONFIG_ARM64) */
>
> static int __init acpi_parse_slit(struct acpi_table_header *table)
> @@ -442,7 +483,7 @@ acpi_table_parse_srat(enum acpi_srat_type id,
>
> int __init acpi_numa_init(void)
> {
> - int cnt = 0;
> + int i, fake_pxm, cnt = 0;
>
> if (acpi_disabled)
> return -EINVAL;
> @@ -478,6 +519,22 @@ int __init acpi_numa_init(void)
> /* SLIT: System Locality Information Table */
> acpi_table_parse(ACPI_SIG_SLIT, acpi_parse_slit);
>
> + /*
> + * CXL Fixed Memory Window Structures (CFMWS) must be parsed
> + * after the SRAT. Create NUMA Nodes for CXL memory ranges that
> + * are defined in the CFMWS and not already defined in the SRAT.
> + * Initialize a fake_pxm as the first available PXM to emulate.
> + */
> +
> + /* fake_pxm is the next unused PXM value after SRAT parsing */
> + for (i = 0, fake_pxm = -1; i < MAX_NUMNODES - 1; i++) {
> + if (node_to_pxm_map[i] > fake_pxm)
> + fake_pxm = node_to_pxm_map[i];
> + }
> + fake_pxm++;
> + acpi_table_parse_cedt(ACPI_CEDT_TYPE_CFMWS, acpi_parse_cfmws,
> + &fake_pxm);
> +
> if (cnt < 0)
> return cnt;
> else if (!parsed_numa_memblks)
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index 91e4072e7649..3163167ecc3a 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -125,7 +125,8 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
> cfmws->base_hpa + cfmws->window_size - 1);
> return 0;
> }
> - dev_dbg(dev, "add: %s range %#llx-%#llx\n", dev_name(&cxld->dev),
> + dev_dbg(dev, "add: %s node: %d range %#llx-%#llx\n",
> + dev_name(&cxld->dev), phys_to_target_node(cxld->range.start),
> cfmws->base_hpa, cfmws->base_hpa + cfmws->window_size - 1);
>
> return 0;
>
next prev parent reply other threads:[~2021-11-18 13:12 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-29 19:51 [PATCH 0/6] Introduce acpi_table_parse_cedt and extra nodes for CXL.mem Dan Williams
2021-10-29 19:51 ` [PATCH 1/6] ACPI: Keep sub-table parsing infrastructure available for modules Dan Williams
2021-10-29 19:51 ` [PATCH 2/6] ACPI: Teach ACPI table parsing about the CEDT header format Dan Williams
2021-10-29 19:51 ` [PATCH 3/6] ACPI: Add a context argument for table parsing handlers Dan Williams
2021-10-29 19:51 ` [PATCH 4/6] cxl/acpi: Convert CFMWS parsing to ACPI sub-table helpers Dan Williams
2021-10-29 19:51 ` [PATCH 5/6] cxl/test: Mock acpi_table_parse_cedt() Dan Williams
2021-10-29 19:51 ` [PATCH 6/6] ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT Dan Williams
2021-11-18 13:12 ` Jonathan Cameron [this message]
2021-11-18 17:14 ` Dan Williams
2021-11-18 17:53 ` Jonathan Cameron
2021-11-18 18:10 ` Dan Williams
2021-11-18 18:18 ` Jonathan Cameron
2021-11-01 12:00 ` [PATCH 0/6] Introduce acpi_table_parse_cedt and extra nodes for CXL.mem Jonathan Cameron
2021-11-02 3:41 ` Dan Williams
2021-11-02 17:44 ` Jonathan Cameron
2021-11-05 15:07 ` Rafael J. Wysocki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211118131218.00005645@Huawei.com \
--to=jonathan.cameron@huawei.com \
--cc=alison.schofield@intel.com \
--cc=dan.j.williams@intel.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=rafael.j.wysocki@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox