* [PATCH] x86/NUMA: don't account hotplug regions
@ 2015-08-28 13:59 Jan Beulich
2015-08-28 14:55 ` Andrew Cooper
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Jan Beulich @ 2015-08-28 13:59 UTC (permalink / raw)
To: xen-devel; +Cc: Andrew Cooper, Jim Fehlig, Keir Fraser, Wei Liu
[-- Attachment #1: Type: text/plain, Size: 4577 bytes --]
... except in cases where they really matter: node_memblk_range[] now
is the only place all regions get stored. nodes[] and NODE_DATA() track
present memory only. This improves the reporting when nodes have
disjoint "normal" and hotplug regions, with the hotplug region sitting
above the highest populated page. In such cases a node's spanned-pages
value (visible in both XEN_SYSCTL_numainfo and 'u' debug key output)
covered all the way up to top of populated memory, giving quite
different a picture from what an otherwise identically configured
system without and hotplug regions would report. Note, however, that
the actual hotplug case (as well as cases of nodes with multiple
disjoint present regions) is still not being handled such that the
reported values would represent how much memory a node really has (but
that can be considered intentional).
Reported-by: Jim Fehlig <jfehlig@suse.com>
This at once makes nodes_cover_memory() no longer consider E820_RAM
regions covered by SRAT hotplug regions.
Also reject self-overlaps with mismatching hotplug flags.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
--- a/xen/arch/x86/srat.c
+++ b/xen/arch/x86/srat.c
@@ -39,6 +39,7 @@ static unsigned node_to_pxm(nodeid_t n);
static int num_node_memblks;
static struct node node_memblk_range[NR_NODE_MEMBLKS];
static nodeid_t memblk_nodeid[NR_NODE_MEMBLKS];
+static __initdata DECLARE_BITMAP(memblk_hotplug, NR_NODE_MEMBLKS);
static inline bool_t node_found(unsigned idx, unsigned pxm)
{
@@ -126,9 +127,9 @@ static __init int conflicting_memblks(u6
if (nd->start == nd->end)
continue;
if (nd->end > start && nd->start < end)
- return memblk_nodeid[i];
+ return i;
if (nd->end == end && nd->start == start)
- return memblk_nodeid[i];
+ return i;
}
return -1;
}
@@ -269,7 +270,6 @@ acpi_numa_processor_affinity_init(struct
void __init
acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
{
- struct node *nd;
u64 start, end;
unsigned pxm;
nodeid_t node;
@@ -304,30 +304,40 @@ acpi_numa_memory_affinity_init(struct ac
}
/* It is fine to add this area to the nodes data it will be used later*/
i = conflicting_memblks(start, end);
- if (i == node) {
- printk(KERN_WARNING
- "SRAT: Warning: PXM %d (%"PRIx64"-%"PRIx64") overlaps with itself (%"
- PRIx64"-%"PRIx64")\n", pxm, start, end, nodes[i].start, nodes[i].end);
- } else if (i >= 0) {
+ if (i < 0)
+ /* everything fine */;
+ else if (memblk_nodeid[i] == node) {
+ bool_t mismatch = !(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) !=
+ !test_bit(i, memblk_hotplug);
+
+ printk("%sSRAT: PXM %u (%"PRIx64"-%"PRIx64") overlaps with itself (%"PRIx64"-%"PRIx64")\n",
+ mismatch ? KERN_ERR : KERN_WARNING, pxm, start, end,
+ node_memblk_range[i].start, node_memblk_range[i].end);
+ if (mismatch) {
+ bad_srat();
+ return;
+ }
+ } else {
printk(KERN_ERR
- "SRAT: PXM %d (%"PRIx64"-%"PRIx64") overlaps with PXM %d (%"
- PRIx64"-%"PRIx64")\n", pxm, start, end, node_to_pxm(i),
- nodes[i].start, nodes[i].end);
+ "SRAT: PXM %u (%"PRIx64"-%"PRIx64") overlaps with PXM %u (%"PRIx64"-%"PRIx64")\n",
+ pxm, start, end, node_to_pxm(memblk_nodeid[i]),
+ node_memblk_range[i].start, node_memblk_range[i].end);
bad_srat();
return;
}
- nd = &nodes[node];
- if (!node_test_and_set(node, memory_nodes_parsed)) {
- nd->start = start;
- nd->end = end;
- } else {
- if (start < nd->start)
+ if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)) {
+ struct node *nd = &nodes[node];
+
+ if (!node_test_and_set(node, memory_nodes_parsed)) {
nd->start = start;
- if (nd->end < end)
nd->end = end;
+ } else {
+ if (start < nd->start)
+ nd->start = start;
+ if (nd->end < end)
+ nd->end = end;
+ }
}
- if ((ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && end > mem_hotplug)
- mem_hotplug = end;
printk(KERN_INFO "SRAT: Node %u PXM %u %"PRIx64"-%"PRIx64"%s\n",
node, pxm, start, end,
ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE ? " (hotplug)" : "");
@@ -335,6 +345,11 @@ acpi_numa_memory_affinity_init(struct ac
node_memblk_range[num_node_memblks].start = start;
node_memblk_range[num_node_memblks].end = end;
memblk_nodeid[num_node_memblks] = node;
+ if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) {
+ __set_bit(num_node_memblks, memblk_hotplug);
+ if (end > mem_hotplug)
+ mem_hotplug = end;
+ }
num_node_memblks++;
}
[-- Attachment #2: x86-NUMA-accounting.patch --]
[-- Type: text/plain, Size: 4616 bytes --]
x86/NUMA: don't account hotplug regions
... except in cases where they really matter: node_memblk_range[] now
is the only place all regions get stored. nodes[] and NODE_DATA() track
present memory only. This improves the reporting when nodes have
disjoint "normal" and hotplug regions, with the hotplug region sitting
above the highest populated page. In such cases a node's spanned-pages
value (visible in both XEN_SYSCTL_numainfo and 'u' debug key output)
covered all the way up to top of populated memory, giving quite
different a picture from what an otherwise identically configured
system without and hotplug regions would report. Note, however, that
the actual hotplug case (as well as cases of nodes with multiple
disjoint present regions) is still not being handled such that the
reported values would represent how much memory a node really has (but
that can be considered intentional).
Reported-by: Jim Fehlig <jfehlig@suse.com>
This at once makes nodes_cover_memory() no longer consider E820_RAM
regions covered by SRAT hotplug regions.
Also reject self-overlaps with mismatching hotplug flags.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
--- a/xen/arch/x86/srat.c
+++ b/xen/arch/x86/srat.c
@@ -39,6 +39,7 @@ static unsigned node_to_pxm(nodeid_t n);
static int num_node_memblks;
static struct node node_memblk_range[NR_NODE_MEMBLKS];
static nodeid_t memblk_nodeid[NR_NODE_MEMBLKS];
+static __initdata DECLARE_BITMAP(memblk_hotplug, NR_NODE_MEMBLKS);
static inline bool_t node_found(unsigned idx, unsigned pxm)
{
@@ -126,9 +127,9 @@ static __init int conflicting_memblks(u6
if (nd->start == nd->end)
continue;
if (nd->end > start && nd->start < end)
- return memblk_nodeid[i];
+ return i;
if (nd->end == end && nd->start == start)
- return memblk_nodeid[i];
+ return i;
}
return -1;
}
@@ -269,7 +270,6 @@ acpi_numa_processor_affinity_init(struct
void __init
acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
{
- struct node *nd;
u64 start, end;
unsigned pxm;
nodeid_t node;
@@ -304,30 +304,40 @@ acpi_numa_memory_affinity_init(struct ac
}
/* It is fine to add this area to the nodes data it will be used later*/
i = conflicting_memblks(start, end);
- if (i == node) {
- printk(KERN_WARNING
- "SRAT: Warning: PXM %d (%"PRIx64"-%"PRIx64") overlaps with itself (%"
- PRIx64"-%"PRIx64")\n", pxm, start, end, nodes[i].start, nodes[i].end);
- } else if (i >= 0) {
+ if (i < 0)
+ /* everything fine */;
+ else if (memblk_nodeid[i] == node) {
+ bool_t mismatch = !(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) !=
+ !test_bit(i, memblk_hotplug);
+
+ printk("%sSRAT: PXM %u (%"PRIx64"-%"PRIx64") overlaps with itself (%"PRIx64"-%"PRIx64")\n",
+ mismatch ? KERN_ERR : KERN_WARNING, pxm, start, end,
+ node_memblk_range[i].start, node_memblk_range[i].end);
+ if (mismatch) {
+ bad_srat();
+ return;
+ }
+ } else {
printk(KERN_ERR
- "SRAT: PXM %d (%"PRIx64"-%"PRIx64") overlaps with PXM %d (%"
- PRIx64"-%"PRIx64")\n", pxm, start, end, node_to_pxm(i),
- nodes[i].start, nodes[i].end);
+ "SRAT: PXM %u (%"PRIx64"-%"PRIx64") overlaps with PXM %u (%"PRIx64"-%"PRIx64")\n",
+ pxm, start, end, node_to_pxm(memblk_nodeid[i]),
+ node_memblk_range[i].start, node_memblk_range[i].end);
bad_srat();
return;
}
- nd = &nodes[node];
- if (!node_test_and_set(node, memory_nodes_parsed)) {
- nd->start = start;
- nd->end = end;
- } else {
- if (start < nd->start)
+ if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)) {
+ struct node *nd = &nodes[node];
+
+ if (!node_test_and_set(node, memory_nodes_parsed)) {
nd->start = start;
- if (nd->end < end)
nd->end = end;
+ } else {
+ if (start < nd->start)
+ nd->start = start;
+ if (nd->end < end)
+ nd->end = end;
+ }
}
- if ((ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && end > mem_hotplug)
- mem_hotplug = end;
printk(KERN_INFO "SRAT: Node %u PXM %u %"PRIx64"-%"PRIx64"%s\n",
node, pxm, start, end,
ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE ? " (hotplug)" : "");
@@ -335,6 +345,11 @@ acpi_numa_memory_affinity_init(struct ac
node_memblk_range[num_node_memblks].start = start;
node_memblk_range[num_node_memblks].end = end;
memblk_nodeid[num_node_memblks] = node;
+ if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) {
+ __set_bit(num_node_memblks, memblk_hotplug);
+ if (end > mem_hotplug)
+ mem_hotplug = end;
+ }
num_node_memblks++;
}
[-- Attachment #3: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] x86/NUMA: don't account hotplug regions
2015-08-28 13:59 [PATCH] x86/NUMA: don't account hotplug regions Jan Beulich
@ 2015-08-28 14:55 ` Andrew Cooper
2015-08-28 21:33 ` Jim Fehlig
2015-08-31 11:47 ` Wei Liu
2 siblings, 0 replies; 4+ messages in thread
From: Andrew Cooper @ 2015-08-28 14:55 UTC (permalink / raw)
To: Jan Beulich, xen-devel; +Cc: Keir Fraser, Jim Fehlig, Wei Liu
On 28/08/15 14:59, Jan Beulich wrote:
> ... except in cases where they really matter: node_memblk_range[] now
> is the only place all regions get stored. nodes[] and NODE_DATA() track
> present memory only. This improves the reporting when nodes have
> disjoint "normal" and hotplug regions, with the hotplug region sitting
> above the highest populated page. In such cases a node's spanned-pages
> value (visible in both XEN_SYSCTL_numainfo and 'u' debug key output)
> covered all the way up to top of populated memory, giving quite
> different a picture from what an otherwise identically configured
> system without and hotplug regions would report. Note, however, that
> the actual hotplug case (as well as cases of nodes with multiple
> disjoint present regions) is still not being handled such that the
> reported values would represent how much memory a node really has (but
> that can be considered intentional).
>
> Reported-by: Jim Fehlig <jfehlig@suse.com>
>
> This at once makes nodes_cover_memory() no longer consider E820_RAM
> regions covered by SRAT hotplug regions.
>
> Also reject self-overlaps with mismatching hotplug flags.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] x86/NUMA: don't account hotplug regions
2015-08-28 13:59 [PATCH] x86/NUMA: don't account hotplug regions Jan Beulich
2015-08-28 14:55 ` Andrew Cooper
@ 2015-08-28 21:33 ` Jim Fehlig
2015-08-31 11:47 ` Wei Liu
2 siblings, 0 replies; 4+ messages in thread
From: Jim Fehlig @ 2015-08-28 21:33 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel, Keir Fraser, Wei Liu, Andrew Cooper
Jan Beulich wrote:
> ... except in cases where they really matter: node_memblk_range[] now
> is the only place all regions get stored. nodes[] and NODE_DATA() track
> present memory only. This improves the reporting when nodes have
> disjoint "normal" and hotplug regions, with the hotplug region sitting
> above the highest populated page. In such cases a node's spanned-pages
> value (visible in both XEN_SYSCTL_numainfo and 'u' debug key output)
> covered all the way up to top of populated memory, giving quite
> different a picture from what an otherwise identically configured
> system without and hotplug regions would report. Note, however, that
> the actual hotplug case (as well as cases of nodes with multiple
> disjoint present regions) is still not being handled such that the
> reported values would represent how much memory a node really has (but
> that can be considered intentional).
>
> Reported-by: Jim Fehlig <jfehlig@suse.com>
>
> This at once makes nodes_cover_memory() no longer consider E820_RAM
> regions covered by SRAT hotplug regions.
>
> Also reject self-overlaps with mismatching hotplug flags.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
Tested-by: Jim Fehlig <jfehlig@suse.com>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] x86/NUMA: don't account hotplug regions
2015-08-28 13:59 [PATCH] x86/NUMA: don't account hotplug regions Jan Beulich
2015-08-28 14:55 ` Andrew Cooper
2015-08-28 21:33 ` Jim Fehlig
@ 2015-08-31 11:47 ` Wei Liu
2 siblings, 0 replies; 4+ messages in thread
From: Wei Liu @ 2015-08-31 11:47 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel, Jim Fehlig, Keir Fraser, Wei Liu, Andrew Cooper
On Fri, Aug 28, 2015 at 07:59:45AM -0600, Jan Beulich wrote:
> ... except in cases where they really matter: node_memblk_range[] now
> is the only place all regions get stored. nodes[] and NODE_DATA() track
> present memory only. This improves the reporting when nodes have
> disjoint "normal" and hotplug regions, with the hotplug region sitting
> above the highest populated page. In such cases a node's spanned-pages
> value (visible in both XEN_SYSCTL_numainfo and 'u' debug key output)
> covered all the way up to top of populated memory, giving quite
> different a picture from what an otherwise identically configured
> system without and hotplug regions would report. Note, however, that
> the actual hotplug case (as well as cases of nodes with multiple
> disjoint present regions) is still not being handled such that the
> reported values would represent how much memory a node really has (but
> that can be considered intentional).
>
> Reported-by: Jim Fehlig <jfehlig@suse.com>
>
> This at once makes nodes_cover_memory() no longer consider E820_RAM
> regions covered by SRAT hotplug regions.
>
> Also reject self-overlaps with mismatching hotplug flags.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-08-31 11:47 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-28 13:59 [PATCH] x86/NUMA: don't account hotplug regions Jan Beulich
2015-08-28 14:55 ` Andrew Cooper
2015-08-28 21:33 ` Jim Fehlig
2015-08-31 11:47 ` Wei Liu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).