From: Alison Schofield <alison.schofield@intel.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
Len Brown <lenb@kernel.org>,
Vishal Verma <vishal.l.verma@intel.com>,
Ira Weiny <ira.weiny@intel.com>,
Ben Widawsky <ben.widawsky@intel.com>,
linux-cxl@vger.kernel.org,
Linux ACPI <linux-acpi@vger.kernel.org>
Subject: Re: [PATCH v3] ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT
Date: Fri, 29 Oct 2021 15:49:14 -0700 [thread overview]
Message-ID: <20211029224914.GA500689@alison-desk> (raw)
In-Reply-To: <CAPcyv4hdC4Uj8YdePMZGFkxgP10VSkX1tiY+ApPctyjfURPSOg@mail.gmail.com>
On Mon, Oct 25, 2021 at 07:47:32PM -0700, Dan Williams wrote:
> On Mon, Oct 18, 2021 at 10:01 PM <alison.schofield@intel.com> wrote:
> >
> > From: Alison Schofield <alison.schofield@intel.com>
> >
> > During NUMA init, CXL memory defined in the SRAT Memory Affinity
> > subtable may be assigned to a NUMA node. Since there is no
> > requirement that the SRAT be comprehensive for CXL memory another
> > mechanism is needed to assign NUMA nodes to CXL memory not identified
> > in the SRAT.
> >
> > Use the CXL Fixed Memory Window Structure (CFMWS) of the ACPI CXL
> > Early Discovery Table (CEDT) to find all CXL memory ranges.
> > Create a NUMA node for each CFMWS that is not already assigned to
> > a NUMA node. Add a memblk attaching its host physical address
> > range to the node.
> >
> > Note that these ranges may not actually map any memory at boot time.
> > They may describe persistent capacity or may be present to enable
> > hot-plug.
> >
> > Consumers can use phys_to_target_node() to discover the NUMA node.
> >
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > ---
The next version of this patch is now included in this patchset that
adds helpers for parsing the CEDT subtables:
https://lore.kernel.org/linux-cxl/163553711933.2509508.2203471175679990.stgit@dwillia2-desk3.amr.corp.intel.com/T/#mf40d84e1ad4c01f69f794d591b07774255993185
It addresses Dan's comments below.
>> snip
> > +{
> > + struct acpi_cedt_cfmws *cfmws;
> > + acpi_size len, cur = 0;
> > + int i, node, pxm = 0;
>
> Shouldn't this be -1, on the idea that the first numa node to assign
> if none are set is zero?
>
> I don't think the way you have it is a problem in practice because
> SRAT should always be there in a NUMA system. However, the first CFMWS
> pxm should start after the last SRAT entry, or 0 if no SRAT entries.
>
> > + void *cedt_subtable;
> > + u64 start, end;
> > +
> > + /* Find the max PXM defined in the SRAT */
> > + for (i = 0; i < MAX_NUMNODES - 1; i++) {
>
> How about:
>
> for (i = 0, pxm = -1; i < MAX_NUMNODES -1; i++)
>
> ...just to keep the initialization close to the use, but that's just a
> personal style preference.
Done.
>
> > + if (node_to_pxm_map[i] > pxm)
> > + pxm = node_to_pxm_map[i];
> > + }
> > + /* Start assigning fake PXM values after the SRAT max PXM */
> > + pxm++;
> > +
> > + len = acpi_cedt->length - sizeof(*acpi_cedt);
> > + cedt_subtable = acpi_cedt + 1;
> > +
> > + while (cur < len) {
>
> Similarly to above I wonder if this would be cleaner as a for loop
> then you could use typical "continue" statements rather than goto. I
> took a stab at creating a for_each_cedt() helper which ended up a
> decent cleanup for drivers/cxl/
>
> drivers/cxl/acpi.c | 48 +++++++++++++++---------------------------------
> 1 file changed, 15 insertions(+), 33 deletions(-)
>
> ...however, I just realized this NUMA code is running at init time, so
> it can just use the acpi_table_parse_entries_array() helper to walk
> the CFMWS like the othe subtable walkers in acpi_numa_init(). You
> would need to update the subtable helpers (acpi_get_subtable_type() et
> al) to recognize the CEDT case.
>
> [ Side note for the implications of acpi_table_parse_entries_array()
> for drivers/cxl/acpi.c ]
>
> Rafael, both the NFIT driver and now the CXL ACPI driver have open
> coded subtable parsing. Any philosophical reason to keep the subtable
> parsing code as __init? It can still be __init and thrown away if
> those drivers are not build-time enabled.
>
The updated patch (in the greater patchset) now uses the new helpers.
> snip
> > + node = acpi_map_pxm_to_node(pxm);
> > + if (node == NUMA_NO_NODE) {
> > + pr_err("ACPI NUMA: Too many proximity domains.\n");
>
> I would add "while processing CFMWS" to make it clear that the BIOS
> technically did not declare too many PXMs; it was the Linux heuristic
> for opportunistically emulating more PXMs.
>
Done.
> > snip
> > }
> > diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> > index 974d497a897d..f837fd715440 100644
> > --- a/include/linux/acpi.h
> > +++ b/include/linux/acpi.h
> > @@ -426,6 +426,7 @@ extern bool acpi_osi_is_win8(void);
> > #ifdef CONFIG_ACPI_NUMA
> > int acpi_map_pxm_to_node(int pxm);
> > int acpi_get_node(acpi_handle handle);
> > +int __init numa_add_memblk(int nodeid, u64 start, u64 end);
>
> This doesn't belong here.
>
> There is already a declaration for this in
> arch/x86/include/asm/numa.h. I think what you are missing is that your
> new code needs to be within the same ifdef guards as the other helpers
> in srat.c that call numa_add_memblk(). See the line that has:
>
> #if defined(CONFIG_X86) || defined(CONFIG_ARM64) || defined(CONFIG_LOONGARCH)
>
> ...above acpi_numa_slit_init()
Done.
prev parent reply other threads:[~2021-10-29 22:42 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-19 5:09 [PATCH v3] ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT alison.schofield
2021-10-20 15:26 ` Rafael J. Wysocki
2021-10-20 15:48 ` Rafael J. Wysocki
2021-10-26 1:09 ` Dan Williams
2021-10-26 13:17 ` Rafael J. Wysocki
2021-10-20 22:03 ` Vikram Sethi
2021-10-21 1:00 ` Alison Schofield
2021-10-21 15:56 ` Vikram Sethi
2021-10-22 2:01 ` Dan Williams
2021-10-22 21:58 ` Vikram Sethi
2021-10-25 19:43 ` Dan Williams
2021-10-26 2:47 ` Dan Williams
2021-10-29 22:49 ` Alison Schofield [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211029224914.GA500689@alison-desk \
--to=alison.schofield@intel.com \
--cc=ben.widawsky@intel.com \
--cc=dan.j.williams@intel.com \
--cc=ira.weiny@intel.com \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=rafael@kernel.org \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox