From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andre Przywara Subject: Re: [PATCH] numa: fix problems with memory-less nodes Date: Wed, 13 Jan 2010 10:42:26 +0100 Message-ID: <4B4D9582.5010806@amd.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Keir Fraser Cc: "xen-devel@lists.xensource.com" List-Id: xen-devel@lists.xenproject.org Keir Fraser wrote: > On 12/01/2010 16:30, "Andre Przywara" wrote: > >> If we decided to not report memory-less nodes in physinfo we should also >> skip them in the node_to_{cpu,memory,dma32_mem} Python lists. Currently >> Xen will not start guests on machines with memory-less nodes which are >> not the last ones. On an 8-node machine with empty nodes 4 and 5 "xm >> info" was reporting wrongly, also the node assignment algorithm crashed >> with a division by zero error. >> The attached patch fixes this by skipping empty nodes in the enumeration >> of resources. > > Where to begin? Firstly, I thought that the ordering of nodes in the > node_to_* lists actually mattered -- the lists are indexed by nodeid (a > handle which can be passed to other Xen interfaces) are they not? If you > don't include empty entries, then the index position of entries is no longer > meaningful. OK, that seems to be an issue. To be honest I am not a fan of omitting nodes from physinfo, but that is what the current code (RC1!) does and it definitely breaks Xen on my box. So I just made this small patch to make it work again. Actually I would opt to revert the patch cropping the number of nodes reported by physinfo (20762:a1d0a575b4ba ?). Yes, that would result in nodes reported with zero memory, but in my tests this did not raise problems, as a node's memory can (and will) be exhausted even during normal operation. To illustrate the problem: My box has 8 nodes, I removed the memory from nodes 4 & 5. With the unpatched version xm info says: total_memory : 73712 free_memory : 70865 node_to_cpu : node0:0-5,24-35 node1:6-11 node2:12-17 node3:18-23 node4:no cpus node5:no cpus node_to_memory : node0:14267 node1:8167 node2:16335 node3:8167 node4:0 node5:0 So this listing completely omits the last two nodes (CPUs 36-47 and the 24 GB connected to them). The debug key triggered Xen-internal listing is correct, though: (XEN) idx0 -> NODE0 start->0 size->4423680 (XEN) phys_to_nid(0000000000001000) -> 0 should be 0 (XEN) idx1 -> NODE1 start->4423680 size->2097152 (XEN) phys_to_nid(0000000438001000) -> 1 should be 1 (XEN) idx2 -> NODE2 start->6520832 size->4194304 (XEN) phys_to_nid(0000000638001000) -> 2 should be 2 (XEN) idx3 -> NODE3 start->10715136 size->2097152 (XEN) phys_to_nid(0000000a38001000) -> 3 should be 3 (XEN) idx6 -> NODE6 start->12812288 size->4194304 (XEN) phys_to_nid(0000000c38001000) -> 6 should be 6 (XEN) idx7 -> NODE7 start->17006592 size->2097152 (XEN) phys_to_nid(0000001038001000) -> 7 should be 7 With the patched xc.so xm info reports: node_to_cpu : node0:0-5,24-35 node1:6-11 node2:12-17 node3:18-23 node4:36-41 node5:42-47 node_to_memory : node0:14267 node1:8167 node2:16335 node3:8167 node4:16335 node5:7590 Although memory less nodes are not very common, it could happen sometimes with our new dual-node processor, where one could (even accidentally) forget to populate certain memory slots, as it has in fact a dual-node dual-channel memory interface. > Secondly, you avoid appending to the node_to_cpu list if the node is > cpu-less. But you avoid appending to the node_to_{memory,dma32} lists only > if the node is *both* cpu-less and memory-less. That's not even consistent. OK, that's a point. I see that the value of node_exists can change. > Please just fix the crap Python code. What part do you exactly mean? The part triggering the division by zero? I will see if I can fix this properly. Regards, Andre. -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany Tel: +49 351 448 3567 12 ----to satisfy European Law for business letters: Advanced Micro Devices GmbH Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Andrew Bowd; Thomas M. McCoy; Giuliano Meroni Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632