linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/2] CXL: Apply SRAT defined PXM to entire CFMWS window
@ 2023-07-10 20:02 alison.schofield
  2023-07-10 20:02 ` [PATCH v4 1/2] x86/numa: Introduce numa_fill_memblks() alison.schofield
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: alison.schofield @ 2023-07-10 20:02 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Dan Williams, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Andy Lutomirski, Peter Zijlstra, Andrew Morton, Jonathan Cameron,
	Dave Jiang, Mike Rapoport
  Cc: Alison Schofield, x86, linux-cxl, linux-acpi, linux-kernel

From: Alison Schofield <alison.schofield@intel.com>


Changes in v4:
- Remove useless export of numa_fill_memblks()  (Dan)
- Rebase on latest tip tree

v3: https://lore.kernel.org/linux-cxl/cover.1687645837.git.alison.schofield@intel.com/

----

Cover Letter:

The CXL subsystem requires the creation of NUMA nodes for CFMWS
Windows[1] not described in the SRAT. The existing implementation
only addresses windows that the SRAT describes completely or not
at all. This work addresses the case of partially described CFMWS
Windows by extending proximity domains in a portion of a CFMWS
window to the entire window.

Introduce a NUMA helper, numa_fill_memblks(), to fill gaps in a
numa_meminfo memblk address range. Update the CFMWS parsing in the
ACPI driver to use numa_fill_memblks() to extend SRAT defined
proximity domains to entire CXL windows.

An RFC of this patchset was previously posted for CXL folks review
here[2]. The RFC feedback led to the implementation here, extending
existing memblks (Dan). Also, both Jonathan and Dan influenced the
changelog comments in the ACPI patch, with regards to setting
expectations on this evolving heuristic.

Repeating here to set reviewer expectations:
*Note that this heuristic will evolve when CFMWS Windows present a
wider range of characteristics. The extension of the proximity domain,
implemented here, is likely a step in developing a more sophisticated
performance profile in the future.

[1] CFMWS is defined in CXL Spec 3.0 Section 9.17.1.3 :
https://www.computeexpresslink.org/spec-landing

A CXL Fixed Memory Window is a region of Host Physical Address (HPA)
Space which routes accesses to CXL Host bridges. The 'S', of CFMWS,
stand for the structure that describes the window, hence it's common
name, CFMWS.

[2] https://lore.kernel.org/linux-cxl/cover.1683742429.git.alison.schofield@intel.com/


Alison Schofield (2):
  x86/numa: Introduce numa_fill_memblks()
  ACPI: NUMA: Apply SRAT proximity domain to entire CFMWS window

 arch/x86/include/asm/sparsemem.h |  2 +
 arch/x86/mm/numa.c               | 80 ++++++++++++++++++++++++++++++++
 drivers/acpi/numa/srat.c         | 11 +++--
 include/linux/numa.h             |  7 +++
 4 files changed, 97 insertions(+), 3 deletions(-)


base-commit: ac442f6a364dd23bc08086f07b4bc4ef8476a9fe
-- 
2.37.3


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v4 1/2] x86/numa: Introduce numa_fill_memblks()
  2023-07-10 20:02 [PATCH v4 0/2] CXL: Apply SRAT defined PXM to entire CFMWS window alison.schofield
@ 2023-07-10 20:02 ` alison.schofield
  2023-07-10 20:02 ` [PATCH v4 2/2] ACPI: NUMA: Apply SRAT proximity domain to entire CFMWS window alison.schofield
  2023-08-08  6:56 ` [PATCH v4 0/2] CXL: Apply SRAT defined PXM " Dan Williams
  2 siblings, 0 replies; 6+ messages in thread
From: alison.schofield @ 2023-07-10 20:02 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Dan Williams, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Andy Lutomirski, Peter Zijlstra, Andrew Morton, Jonathan Cameron,
	Dave Jiang, Mike Rapoport
  Cc: Alison Schofield, x86, linux-cxl, linux-acpi, linux-kernel,
	Derick Marks

From: Alison Schofield <alison.schofield@intel.com>

numa_fill_memblks() fills in the gaps in numa_meminfo memblks
over an physical address range.

The ACPI driver will use numa_fill_memblks() to implement a new Linux
policy that prescribes extending proximity domains in a portion of a
CFMWS window to the entire window.

Dan Williams offered this explanation of the policy:
A CFWMS is an ACPI data structure that indicates *potential* locations
where CXL memory can be placed. It is the playground where the CXL
driver has free reign to establish regions. That space can be populated
by BIOS created regions, or driver created regions, after hotplug or
other reconfiguration.

When BIOS creates a region in a CXL Window it additionally describes
that subset of the Window range in the other typical ACPI tables SRAT,
SLIT, and HMAT. The rationale for BIOS not pre-describing the entire
CXL Window in SRAT, SLIT, and HMAT is that it can not predict the
future. I.e. there is nothing stopping higher or lower performance
devices being placed in the same Window. Compare that to ACPI memory
hotplug that just onlines additional capacity in the proximity domain
with little freedom for dynamic performance differentiation.

That leaves the OS with a choice, should unpopulated window capacity
match the proximity domain of an existing region, or should it allocate
a new one? This patch takes the simple position of minimizing proximity
domain proliferation by reusing any proximity domain intersection for
the entire Window. If the Window has no intersections then allocate a
new proximity domain. Note that SRAT, SLIT and HMAT information can be
enumerated dynamically in a standard way from device provided data.
Think of CXL as the end of ACPI needing to describe memory attributes,
CXL offers a standard discovery model for performance attributes, but
Linux still needs to interoperate with the old regime.

Reported-by: Derick Marks <derick.w.marks@intel.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Derick Marks <derick.w.marks@intel.com>
---
 arch/x86/include/asm/sparsemem.h |  2 +
 arch/x86/mm/numa.c               | 80 ++++++++++++++++++++++++++++++++
 include/linux/numa.h             |  7 +++
 3 files changed, 89 insertions(+)

diff --git a/arch/x86/include/asm/sparsemem.h b/arch/x86/include/asm/sparsemem.h
index 64df897c0ee3..1be13b2dfe8b 100644
--- a/arch/x86/include/asm/sparsemem.h
+++ b/arch/x86/include/asm/sparsemem.h
@@ -37,6 +37,8 @@ extern int phys_to_target_node(phys_addr_t start);
 #define phys_to_target_node phys_to_target_node
 extern int memory_add_physaddr_to_nid(u64 start);
 #define memory_add_physaddr_to_nid memory_add_physaddr_to_nid
+extern int numa_fill_memblks(u64 start, u64 end);
+#define numa_fill_memblks numa_fill_memblks
 #endif
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 2aadb2019b4f..c01c5506fd4a 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -11,6 +11,7 @@
 #include <linux/nodemask.h>
 #include <linux/sched.h>
 #include <linux/topology.h>
+#include <linux/sort.h>
 
 #include <asm/e820/api.h>
 #include <asm/proto.h>
@@ -961,4 +962,83 @@ int memory_add_physaddr_to_nid(u64 start)
 	return nid;
 }
 EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
+
+static int __init cmp_memblk(const void *a, const void *b)
+{
+	const struct numa_memblk *ma = *(const struct numa_memblk **)a;
+	const struct numa_memblk *mb = *(const struct numa_memblk **)b;
+
+	return ma->start - mb->start;
+}
+
+static struct numa_memblk *numa_memblk_list[NR_NODE_MEMBLKS] __initdata;
+
+/**
+ * numa_fill_memblks - Fill gaps in numa_meminfo memblks
+ * @start: address to begin fill
+ * @end: address to end fill
+ *
+ * Find and extend numa_meminfo memblks to cover the @start-@end
+ * physical address range, such that the first memblk includes
+ * @start, the last memblk includes @end, and any gaps in between
+ * are filled.
+ *
+ * RETURNS:
+ * 0		  : Success
+ * NUMA_NO_MEMBLK : No memblk exists in @start-@end range
+ */
+
+int __init numa_fill_memblks(u64 start, u64 end)
+{
+	struct numa_memblk **blk = &numa_memblk_list[0];
+	struct numa_meminfo *mi = &numa_meminfo;
+	int count = 0;
+	u64 prev_end;
+
+	/*
+	 * Create a list of pointers to numa_meminfo memblks that
+	 * overlap start, end. Exclude (start == bi->end) since
+	 * end addresses in both a CFMWS range and a memblk range
+	 * are exclusive.
+	 *
+	 * This list of pointers is used to make in-place changes
+	 * that fill out the numa_meminfo memblks.
+	 */
+	for (int i = 0; i < mi->nr_blks; i++) {
+		struct numa_memblk *bi = &mi->blk[i];
+
+		if (start < bi->end && end >= bi->start) {
+			blk[count] = &mi->blk[i];
+			count++;
+		}
+	}
+	if (!count)
+		return NUMA_NO_MEMBLK;
+
+	/* Sort the list of pointers in memblk->start order */
+	sort(&blk[0], count, sizeof(blk[0]), cmp_memblk, NULL);
+
+	/* Make sure the first/last memblks include start/end */
+	blk[0]->start = min(blk[0]->start, start);
+	blk[count - 1]->end = max(blk[count - 1]->end, end);
+
+	/*
+	 * Fill any gaps by tracking the previous memblks
+	 * end address and backfilling to it if needed.
+	 */
+	prev_end = blk[0]->end;
+	for (int i = 1; i < count; i++) {
+		struct numa_memblk *curr = blk[i];
+
+		if (prev_end >= curr->start) {
+			if (prev_end < curr->end)
+				prev_end = curr->end;
+		} else {
+			curr->start = prev_end;
+			prev_end = curr->end;
+		}
+	}
+	return 0;
+}
+
 #endif
diff --git a/include/linux/numa.h b/include/linux/numa.h
index 59df211d051f..0f512c0aba54 100644
--- a/include/linux/numa.h
+++ b/include/linux/numa.h
@@ -12,6 +12,7 @@
 #define MAX_NUMNODES    (1 << NODES_SHIFT)
 
 #define	NUMA_NO_NODE	(-1)
+#define	NUMA_NO_MEMBLK	(-1)
 
 /* optionally keep NUMA memory info available post init */
 #ifdef CONFIG_NUMA_KEEP_MEMINFO
@@ -43,6 +44,12 @@ static inline int phys_to_target_node(u64 start)
 	return 0;
 }
 #endif
+#ifndef numa_fill_memblks
+static inline int __init numa_fill_memblks(u64 start, u64 end)
+{
+	return NUMA_NO_MEMBLK;
+}
+#endif
 #else /* !CONFIG_NUMA */
 static inline int numa_map_to_online_node(int node)
 {
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v4 2/2] ACPI: NUMA: Apply SRAT proximity domain to entire CFMWS window
  2023-07-10 20:02 [PATCH v4 0/2] CXL: Apply SRAT defined PXM to entire CFMWS window alison.schofield
  2023-07-10 20:02 ` [PATCH v4 1/2] x86/numa: Introduce numa_fill_memblks() alison.schofield
@ 2023-07-10 20:02 ` alison.schofield
  2023-09-07 21:42   ` Dave Hansen
  2023-09-12 21:13   ` Dan Williams
  2023-08-08  6:56 ` [PATCH v4 0/2] CXL: Apply SRAT defined PXM " Dan Williams
  2 siblings, 2 replies; 6+ messages in thread
From: alison.schofield @ 2023-07-10 20:02 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Dan Williams, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Andy Lutomirski, Peter Zijlstra, Andrew Morton, Jonathan Cameron,
	Dave Jiang, Mike Rapoport
  Cc: Alison Schofield, x86, linux-cxl, linux-acpi, linux-kernel,
	Derick Marks

From: Alison Schofield <alison.schofield@intel.com>

Commit fd49f99c1809 ("ACPI: NUMA: Add a node and memblk for each
CFMWS not in SRAT") did not account for the case where the BIOS
only partially describes a CFMWS Window in the SRAT. That means
the omitted address ranges, of a partially described CFMWS Window,
do not get assigned to a NUMA node.

Replace the call to phys_to_target_node() with numa_add_memblks().
Numa_add_memblks() searches an HPA range for existing memblk(s)
and extends those memblk(s) to fill the entire CFMWS Window.

Extending the existing memblks is a simple strategy that reuses
SRAT defined proximity domains from part of a window to fill out
the entire window, based on the knowledge* that all of a CFMWS
window is of a similar performance class.

*Note that this heuristic will evolve when CFMWS Windows present
a wider range of characteristics. The extension of the proximity
domain, implemented here, is likely a step in developing a more
sophisticated performance profile in the future.

There is no change in behavior when the SRAT does not describe
the CFMWS Window at all. In that case, a new NUMA node with a
single memblk covering the entire CFMWS Window is created.

Fixes: fd49f99c1809 ("ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT")
Reported-by: Derick Marks <derick.w.marks@intel.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Tested-by: Derick Marks <derick.w.marks@intel.com>
---
 drivers/acpi/numa/srat.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
index 1f4fc5f8a819..12f330b0eac0 100644
--- a/drivers/acpi/numa/srat.c
+++ b/drivers/acpi/numa/srat.c
@@ -310,11 +310,16 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
 	start = cfmws->base_hpa;
 	end = cfmws->base_hpa + cfmws->window_size;
 
-	/* Skip if the SRAT already described the NUMA details for this HPA */
-	node = phys_to_target_node(start);
-	if (node != NUMA_NO_NODE)
+	/*
+	 * The SRAT may have already described NUMA details for all,
+	 * or a portion of, this CFMWS HPA range. Extend the memblks
+	 * found for any portion of the window to cover the entire
+	 * window.
+	 */
+	if (!numa_fill_memblks(start, end))
 		return 0;
 
+	/* No SRAT description. Create a new node. */
 	node = acpi_map_pxm_to_node(*fake_pxm);
 
 	if (node == NUMA_NO_NODE) {
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* RE: [PATCH v4 0/2] CXL: Apply SRAT defined PXM to entire CFMWS window
  2023-07-10 20:02 [PATCH v4 0/2] CXL: Apply SRAT defined PXM to entire CFMWS window alison.schofield
  2023-07-10 20:02 ` [PATCH v4 1/2] x86/numa: Introduce numa_fill_memblks() alison.schofield
  2023-07-10 20:02 ` [PATCH v4 2/2] ACPI: NUMA: Apply SRAT proximity domain to entire CFMWS window alison.schofield
@ 2023-08-08  6:56 ` Dan Williams
  2 siblings, 0 replies; 6+ messages in thread
From: Dan Williams @ 2023-08-08  6:56 UTC (permalink / raw)
  To: alison.schofield, Rafael J. Wysocki, Len Brown, Dan Williams,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Andrew Morton,
	Jonathan Cameron, Dave Jiang, Mike Rapoport
  Cc: Alison Schofield, x86, linux-cxl, linux-acpi, linux-kernel

alison.schofield@ wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> 
> Changes in v4:
> - Remove useless export of numa_fill_memblks()  (Dan)
> - Rebase on latest tip tree

This thread has gone quiet. Any concerns from x86/mm folks if I take
this through the CXL tree with an x86 ack? Or anything else I can help
out with on this one?

> 
> v3: https://lore.kernel.org/linux-cxl/cover.1687645837.git.alison.schofield@intel.com/
> 
> ----
> 
> Cover Letter:
> 
> The CXL subsystem requires the creation of NUMA nodes for CFMWS
> Windows[1] not described in the SRAT. The existing implementation
> only addresses windows that the SRAT describes completely or not
> at all. This work addresses the case of partially described CFMWS
> Windows by extending proximity domains in a portion of a CFMWS
> window to the entire window.
[..]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4 2/2] ACPI: NUMA: Apply SRAT proximity domain to entire CFMWS window
  2023-07-10 20:02 ` [PATCH v4 2/2] ACPI: NUMA: Apply SRAT proximity domain to entire CFMWS window alison.schofield
@ 2023-09-07 21:42   ` Dave Hansen
  2023-09-12 21:13   ` Dan Williams
  1 sibling, 0 replies; 6+ messages in thread
From: Dave Hansen @ 2023-09-07 21:42 UTC (permalink / raw)
  To: alison.schofield, Rafael J. Wysocki, Len Brown, Dan Williams,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Andrew Morton,
	Jonathan Cameron, Dave Jiang, Mike Rapoport
  Cc: x86, linux-cxl, linux-acpi, linux-kernel, Derick Marks

On 7/10/23 13:02, alison.schofield@intel.com wrote:
> +	/*
> +	 * The SRAT may have already described NUMA details for all,
> +	 * or a portion of, this CFMWS HPA range. Extend the memblks
> +	 * found for any portion of the window to cover the entire
> +	 * window.
> +	 */
> +	if (!numa_fill_memblks(start, end))
>  		return 0;

FWIW, the pieces didn't really fit together for me for this pair of
patches until I read *this* comment.

Either way:

Acked-by: Dave Hansen <dave.hansen@linux.intel.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH v4 2/2] ACPI: NUMA: Apply SRAT proximity domain to entire CFMWS window
  2023-07-10 20:02 ` [PATCH v4 2/2] ACPI: NUMA: Apply SRAT proximity domain to entire CFMWS window alison.schofield
  2023-09-07 21:42   ` Dave Hansen
@ 2023-09-12 21:13   ` Dan Williams
  1 sibling, 0 replies; 6+ messages in thread
From: Dan Williams @ 2023-09-12 21:13 UTC (permalink / raw)
  To: alison.schofield, Rafael J. Wysocki, Len Brown, Dan Williams,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Andrew Morton,
	Jonathan Cameron, Dave Jiang, Mike Rapoport
  Cc: Alison Schofield, x86, linux-cxl, linux-acpi, linux-kernel,
	Derick Marks

alison.schofield@ wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> Commit fd49f99c1809 ("ACPI: NUMA: Add a node and memblk for each
> CFMWS not in SRAT") did not account for the case where the BIOS
> only partially describes a CFMWS Window in the SRAT. That means
> the omitted address ranges, of a partially described CFMWS Window,
> do not get assigned to a NUMA node.
> 
> Replace the call to phys_to_target_node() with numa_add_memblks().
> Numa_add_memblks() searches an HPA range for existing memblk(s)
> and extends those memblk(s) to fill the entire CFMWS Window.
> 
> Extending the existing memblks is a simple strategy that reuses
> SRAT defined proximity domains from part of a window to fill out
> the entire window, based on the knowledge* that all of a CFMWS
> window is of a similar performance class.
> 
> *Note that this heuristic will evolve when CFMWS Windows present
> a wider range of characteristics. The extension of the proximity
> domain, implemented here, is likely a step in developing a more
> sophisticated performance profile in the future.
> 
> There is no change in behavior when the SRAT does not describe
> the CFMWS Window at all. In that case, a new NUMA node with a
> single memblk covering the entire CFMWS Window is created.
> 
> Fixes: fd49f99c1809 ("ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT")
> Reported-by: Derick Marks <derick.w.marks@intel.com>
> Suggested-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> Tested-by: Derick Marks <derick.w.marks@intel.com>

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-09-12 21:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-10 20:02 [PATCH v4 0/2] CXL: Apply SRAT defined PXM to entire CFMWS window alison.schofield
2023-07-10 20:02 ` [PATCH v4 1/2] x86/numa: Introduce numa_fill_memblks() alison.schofield
2023-07-10 20:02 ` [PATCH v4 2/2] ACPI: NUMA: Apply SRAT proximity domain to entire CFMWS window alison.schofield
2023-09-07 21:42   ` Dave Hansen
2023-09-12 21:13   ` Dan Williams
2023-08-08  6:56 ` [PATCH v4 0/2] CXL: Apply SRAT defined PXM " Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).