All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86/NUMA: improve memnode_shift calculation for multi node system
@ 2022-09-27 16:20 Jan Beulich
  2022-09-30  8:25 ` Roger Pau Monné
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Jan Beulich @ 2022-09-27 16:20 UTC (permalink / raw)
  To: xen-devel@lists.xenproject.org
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

SRAT may describe individual nodes using multiple ranges. When they're
adjacent (with or without a gap in between), only the start of the first
such range actually needs accounting for. Furthermore the very first
range doesn't need considering of its start address at all, as it's fine
to associate all lower addresses (with no memory) with that same node.
For this to work, the array of ranges needs to be sorted by address -
adjust logic accordingly in acpi_numa_memory_affinity_init().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
On my Dinar and Rome systems this changes memnodemapsize to a single
page. Originally they used 9 / 130 pages (with shifts going from 8 / 6
to 15 / 16) respectively, resulting from lowmem gaps [A0,FF] / [A0,BF].

This goes on top of "x86/NUMA: correct memnode_shift calculation for
single node system".

--- a/xen/arch/x86/numa.c
+++ b/xen/arch/x86/numa.c
@@ -127,7 +127,8 @@ static int __init extract_lsb_from_nodes
         epdx = paddr_to_pdx(nodes[i].end - 1) + 1;
         if ( spdx >= epdx )
             continue;
-        bitfield |= spdx;
+        if ( i && (!nodeids || nodeids[i - 1] != nodeids[i]) )
+            bitfield |= spdx;
         if ( !i || !nodeids || nodeids[i - 1] != nodeids[i] )
             nodes_used++;
         if ( epdx > memtop )
--- a/xen/arch/x86/srat.c
+++ b/xen/arch/x86/srat.c
@@ -312,6 +312,7 @@ acpi_numa_memory_affinity_init(const str
 	unsigned pxm;
 	nodeid_t node;
 	unsigned int i;
+	bool next = false;
 
 	if (srat_disabled())
 		return;
@@ -413,14 +414,37 @@ acpi_numa_memory_affinity_init(const str
 	       node, pxm, start, end - 1,
 	       ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE ? " (hotplug)" : "");
 
-	node_memblk_range[num_node_memblks].start = start;
-	node_memblk_range[num_node_memblks].end = end;
-	memblk_nodeid[num_node_memblks] = node;
+	/* Keep node_memblk_range[] sorted by address. */
+	for (i = 0; i < num_node_memblks; ++i)
+		if (node_memblk_range[i].start > start ||
+		    (node_memblk_range[i].start == start &&
+		     node_memblk_range[i].end > end))
+			break;
+
+	memmove(&node_memblk_range[i + 1], &node_memblk_range[i],
+	        (num_node_memblks - i) * sizeof(*node_memblk_range));
+	node_memblk_range[i].start = start;
+	node_memblk_range[i].end = end;
+
+	memmove(&memblk_nodeid[i + 1], &memblk_nodeid[i],
+	        (num_node_memblks - i) * sizeof(*memblk_nodeid));
+	memblk_nodeid[i] = node;
+
 	if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) {
-		__set_bit(num_node_memblks, memblk_hotplug);
+		next = true;
 		if (end > mem_hotplug)
 			mem_hotplug = end;
 	}
+	for (; i <= num_node_memblks; ++i) {
+		bool prev = next;
+
+		next = test_bit(i, memblk_hotplug);
+		if (prev)
+			__set_bit(i, memblk_hotplug);
+		else
+			__clear_bit(i, memblk_hotplug);
+	}
+
 	num_node_memblks++;
 }
 


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-09-30 12:06 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-09-27 16:20 [PATCH] x86/NUMA: improve memnode_shift calculation for multi node system Jan Beulich
2022-09-30  8:25 ` Roger Pau Monné
2022-09-30  8:36   ` Jan Beulich
2022-09-30 10:03     ` Roger Pau Monné
2022-09-30 10:50       ` Jan Beulich
2022-09-30 11:25 ` Roger Pau Monné
2022-09-30 11:54 ` Andrew Cooper
2022-09-30 12:05   ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.