From: Mike Rapoport <rppt@kernel.org>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Alison Schofield <alison.schofield@intel.com>,
"Rafael J. Wysocki" <rafael@kernel.org>,
Len Brown <lenb@kernel.org>, Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Andy Lutomirski <luto@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
Jonathan Cameron <Jonathan.Cameron@huawei.com>,
Dave Jiang <dave.jiang@intel.com>,
x86@kernel.org, linux-cxl@vger.kernel.org,
linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org,
rppt@linux.ibm.com
Subject: Re: [PATCH 1/2] x86/numa: Introduce numa_fill_memblks()
Date: Wed, 7 Jun 2023 13:44:50 +0300 [thread overview]
Message-ID: <20230607104450.GE52412@kernel.org> (raw)
In-Reply-To: <647f9aef8928d_1433ac294ee@dwillia2-xfh.jf.intel.com.notmuch>
On Tue, Jun 06, 2023 at 01:45:35PM -0700, Dan Williams wrote:
> [ add Mike, see "[Mike]" note below... ]
>
> Alison Schofield wrote:
> > On Sat, Jun 03, 2023 at 04:53:13PM -0700, Dan Williams wrote:
> > > alison.schofield@ wrote:
> > > > From: Alison Schofield <alison.schofield@intel.com>
> > > >
> > > > numa_fill_memblks() fills in the gaps in numa_meminfo memblks
> > > > over an HPA address range.
> > > >
> > > > The initial use case is the ACPI driver that needs to extend
> > > > SRAT defined proximity domains to an entire CXL CFMWS Window[1].
> > >
> > > I feel like this demands more explanation because the "need" is not
> > > apparent. In fact its a Linux policy choice not a requirement. The next
> > > patch has some of this, but this story is needed earlier for someone
> > > that reads this patch first. Something like:
> > >
> >
> > Hi Dan,
> >
> > Thanks for the review :)
> >
> > Sure, I can add the story below to make the 'need' for this function
> > more apparent, as well as s/needs/want so as not to conflate need with
> > requirement.
> >
> > > ---
> > >
> > > The CFWMS is an ACPI data structure that indicates *potential* locations
> > > where CXL memory can be placed. It is the playground where the CXL
> > > driver has free reign to establish regions. That space can be populated
> > > by BIOS created regions, or driver created regions, after hotplug or
> > > other reconfiguration.
> > >
> > > When the BIOS creates a region in a CXL Window it additionally describes
> > > that subset of the Window range in the other typical ACPI tables SRAT,
> > > SLIT, and HMAT. The rationale for the BIOS not pre-describing the entire
> > > CXL Window in SRAT, SLIT, and HMAT is that it can not predict the
> > > future. I.e. there is nothing stopping higher or lower performance
> > > devices being placed in the same Window. Compare that to ACPI memory
> > > hotplug that just onlines additional capacity in the proximity domain
> > > with little freedom for dynamic performance differentiation.
> > >
> > > That leaves the OS with a choice, should unpopulated window capacity
> > > match the proximity domain of an existing region, or should it allocate
> > > a new one? This patch takes the simple position of minimizing proximity
> > > domain proliferation and reuse any proximity domain intersection for the
> > > entire Window. If the Window has no intersections then allocate a new
> > > proximity domain. Note that SRAT, SLIT and HMAT information can be
> > > enumerated dynamically in a standard way from device provided data.
> > > Think of CXL as the end of ACPI needing to describe memory attributes,
> > > CXL offers a standard discovery model for performance attributes, but
> > > Linux still needs to interoperate with the old regime.
> > >
> > > ---
> > >
> > > >
> > > > The APCI driver expects to use numa_fill_memblks() while parsing
> > >
> > > s/APCI/ACPI/
> > >
> > > Again, the ACPI code does not have any expectation, this is pure OS
> > > policy decision about how to handle undescribed memory.
> > >
> >
> > The intent was to show the pending use case, perhaps 'wants to' use
> > this function to enact a purely OS policy decision!
>
> Sounds good, yeah I tend to read "need" as a requirement and assume that
> Linux is out of spec or something breaks if it does not do the needed
> thing.
>
> >
> >
> > > > the CFMWS. Extending the memblks created during SRAT parsing, to
> > > > cover the entire CFMWS Window, is desirable because everything in
> > > > a CFMWS Window is expected to be of a similar performance class.
> > > >
> > > > Requires CONFIG_NUMA_KEEP_MEMINFO.
> > >
> > > Not sure this adds anything to the description.
> > >
> > > >
> > > > [1] A CXL CFMWS Window represents a contiguous CXL memory resource,
> > > > aka an HPA range. The CFMWS (CXL Fixed Memory Window Structure) is
> > > > part of the ACPI CEDT (CXL Early Discovery Table).
> > > >
> > > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > > > ---
> > > > arch/x86/include/asm/sparsemem.h | 2 +
> > > > arch/x86/mm/numa.c | 82 ++++++++++++++++++++++++++++++++
> > > > include/linux/numa.h | 7 +++
> > > > 3 files changed, 91 insertions(+)
> > > >
> > > > diff --git a/arch/x86/include/asm/sparsemem.h b/arch/x86/include/asm/sparsemem.h
> > > > index 64df897c0ee3..1be13b2dfe8b 100644
> > > > --- a/arch/x86/include/asm/sparsemem.h
> > > > +++ b/arch/x86/include/asm/sparsemem.h
> > > > @@ -37,6 +37,8 @@ extern int phys_to_target_node(phys_addr_t start);
> > > > #define phys_to_target_node phys_to_target_node
> > > > extern int memory_add_physaddr_to_nid(u64 start);
> > > > #define memory_add_physaddr_to_nid memory_add_physaddr_to_nid
> > > > +extern int numa_fill_memblks(u64 start, u64 end);
> > > > +#define numa_fill_memblks numa_fill_memblks
> > >
> > > What is this for? The other defines are due to being an arch-specific
> > > API and the #define is how the arch declares that it has a local version
> > > to replace the generic one.
> >
> > That define, along with the numa.h change below, are to support builds of
> > CONFIG_ARM64 and CONFIG_LOONGARCH, both include the caller acpi_parse_cfmws(),
> > of numa_fill_memblks().
>
> [Mike]
>
> Hmm, ok, but this is piling onto the maintenance burden of x86 not
> getting onboard with memblock for numa info yet. At a minimum that
> avoidance of touching the ARM64 and LOONGARCH cases needs to be called
> out, but it would be useful to have a discussion about the options here
> with questions like:
>
> - What's blocking x86 from switching to memblock?
To start with, someone need to work on it :)
There are some differences in how drivers/base/arch_numa.c and
arch/x86/mm/numa.c handle SRAT ranges. E.g. x86 checks that SRAT covers all
the memory reported by e820 and have this peculiar dance around hotplugable
memory for the sake of movable_node.
Another x86 specific thing that is build around numa_meminfo is the
numa_emulation.
I don't see a conceptual reason why arch_numa.c cannot handle x86, but
that's quite some work needed to make that happen.
> - Or, does the memblock API support what numa_fill_memblks() wants to
> do? I.e. add a real numa_fill_memblks() implementation to
> drivers/base/arch_numa.c rather than skip SRAT based fixups for the
> generic case.
memblock does not have a notion of empty physical ranges, so it will
require a new set of regions to support what numa_fill_memblks() wants to
do.
With this patch numa_meminfo essentially becomes a superset of
memblock.memory and to have a generic implementation in
drivers/base/arch_numa.c this set should be kept somewhere.
> Last I remember it was the conceptual disconnect of x86 not marking Reserved
> ranges as memory like other architectures:
>
> https://lore.kernel.org/all/20200708091520.GE128651@kernel.org/
This was more about e820 vs memblock, I don't think it's relevant here.
> ...but its been a while since this last came up and I have not been
> following memblock developments. Maybe the anwser is the same in the
> end, add x86-specific numa_fill_memblks, but this is as good a time as
> any to revisit carrying that burden.
I've been thinking about how to make arch_numa to support x86 and (sigh)
loongarch, and the simplest way looks like shoving numa_meminfo there and
then optimizing redundant pieces.
For CXL on arm64/riscv we'd need a new data structure for empty physical
ranges anyway at some point and numa_meminfo quite fits the requirements.
We can later reconsider numa_meminfo vs memblock relationship.
That said, add x86-specific numa_fill_memblks and revisit this later is a
option as well :)
--
Sincerely yours,
Mike.
next prev parent reply other threads:[~2023-06-07 10:45 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-19 0:04 [PATCH 0/2] CXL: Apply SRAT defined PXM to entire CFMWS window alison.schofield
2023-05-19 0:04 ` [PATCH 1/2] x86/numa: Introduce numa_fill_memblks() alison.schofield
2023-05-19 0:08 ` Dave Hansen
2023-05-19 0:26 ` Alison Schofield
2023-05-19 0:40 ` Dan Williams
2023-05-19 0:43 ` Dave Hansen
2023-05-19 1:56 ` Dan Williams
2023-06-03 23:53 ` Dan Williams
2023-06-06 20:03 ` Alison Schofield
2023-06-06 20:45 ` Dan Williams
2023-06-07 10:44 ` Mike Rapoport [this message]
2023-05-19 0:04 ` [PATCH 2/2] ACPI: NUMA: Apply SRAT proximity domain to entire CFMWS window alison.schofield
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230607104450.GE52412@kernel.org \
--to=rppt@kernel.org \
--cc=Jonathan.Cameron@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=alison.schofield@intel.com \
--cc=bp@alien8.de \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=dave.jiang@intel.com \
--cc=hpa@zytor.com \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rafael@kernel.org \
--cc=rppt@linux.ibm.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox