From: Andre Przywara <andre.przywara@amd.com>
To: Keir Fraser <keir.fraser@eu.citrix.com>
Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>
Subject: Re: [PATCH] numa: fix problems with memory-less nodes
Date: Wed, 13 Jan 2010 10:42:26 +0100 [thread overview]
Message-ID: <4B4D9582.5010806@amd.com> (raw)
In-Reply-To: <C773342E.632A%keir.fraser@eu.citrix.com>
Keir Fraser wrote:
> On 12/01/2010 16:30, "Andre Przywara" <andre.przywara@amd.com> wrote:
>
>> If we decided to not report memory-less nodes in physinfo we should also
>> skip them in the node_to_{cpu,memory,dma32_mem} Python lists. Currently
>> Xen will not start guests on machines with memory-less nodes which are
>> not the last ones. On an 8-node machine with empty nodes 4 and 5 "xm
>> info" was reporting wrongly, also the node assignment algorithm crashed
>> with a division by zero error.
>> The attached patch fixes this by skipping empty nodes in the enumeration
>> of resources.
>
> Where to begin? Firstly, I thought that the ordering of nodes in the
> node_to_* lists actually mattered -- the lists are indexed by nodeid (a
> handle which can be passed to other Xen interfaces) are they not? If you
> don't include empty entries, then the index position of entries is no longer
> meaningful.
OK, that seems to be an issue.
To be honest I am not a fan of omitting nodes from physinfo, but that is
what the current code (RC1!) does and it definitely breaks Xen on my
box. So I just made this small patch to make it work again.
Actually I would opt to revert the patch cropping the number of nodes
reported by physinfo (20762:a1d0a575b4ba ?). Yes, that would result in
nodes reported with zero memory, but in my tests this did not raise
problems, as a node's memory can (and will) be exhausted even during
normal operation.
To illustrate the problem:
My box has 8 nodes, I removed the memory from nodes 4 & 5.
With the unpatched version xm info says:
total_memory : 73712
free_memory : 70865
node_to_cpu : node0:0-5,24-35
node1:6-11
node2:12-17
node3:18-23
node4:no cpus
node5:no cpus
node_to_memory : node0:14267
node1:8167
node2:16335
node3:8167
node4:0
node5:0
So this listing completely omits the last two nodes (CPUs 36-47 and the
24 GB connected to them). The debug key triggered Xen-internal listing
is correct, though:
(XEN) idx0 -> NODE0 start->0 size->4423680
(XEN) phys_to_nid(0000000000001000) -> 0 should be 0
(XEN) idx1 -> NODE1 start->4423680 size->2097152
(XEN) phys_to_nid(0000000438001000) -> 1 should be 1
(XEN) idx2 -> NODE2 start->6520832 size->4194304
(XEN) phys_to_nid(0000000638001000) -> 2 should be 2
(XEN) idx3 -> NODE3 start->10715136 size->2097152
(XEN) phys_to_nid(0000000a38001000) -> 3 should be 3
(XEN) idx6 -> NODE6 start->12812288 size->4194304
(XEN) phys_to_nid(0000000c38001000) -> 6 should be 6
(XEN) idx7 -> NODE7 start->17006592 size->2097152
(XEN) phys_to_nid(0000001038001000) -> 7 should be 7
With the patched xc.so xm info reports:
node_to_cpu : node0:0-5,24-35
node1:6-11
node2:12-17
node3:18-23
node4:36-41
node5:42-47
node_to_memory : node0:14267
node1:8167
node2:16335
node3:8167
node4:16335
node5:7590
Although memory less nodes are not very common, it could happen
sometimes with our new dual-node processor, where one could (even
accidentally) forget to populate certain memory slots, as it has in fact
a dual-node dual-channel memory interface.
> Secondly, you avoid appending to the node_to_cpu list if the node is
> cpu-less. But you avoid appending to the node_to_{memory,dma32} lists only
> if the node is *both* cpu-less and memory-less. That's not even consistent.
OK, that's a point. I see that the value of node_exists can change.
> Please just fix the crap Python code.
What part do you exactly mean? The part triggering the division by zero?
I will see if I can fix this properly.
Regards,
Andre.
--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448 3567 12
----to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Andrew Bowd; Thomas M. McCoy; Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632
next prev parent reply other threads:[~2010-01-13 9:42 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-12 16:30 [PATCH] numa: fix problems with memory-less nodes Andre Przywara
2010-01-13 8:26 ` Keir Fraser
2010-01-13 9:42 ` Andre Przywara [this message]
2010-01-13 9:55 ` Keir Fraser
2010-01-13 10:02 ` Keir Fraser
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B4D9582.5010806@amd.com \
--to=andre.przywara@amd.com \
--cc=keir.fraser@eu.citrix.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).