All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andre Przywara <andre.przywara@amd.com>
To: Keir Fraser <keir.fraser@eu.citrix.com>
Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>
Subject: Re: [PATCH] numa: fix problems with memory-less nodes
Date: Wed, 13 Jan 2010 10:42:26 +0100	[thread overview]
Message-ID: <4B4D9582.5010806@amd.com> (raw)
In-Reply-To: <C773342E.632A%keir.fraser@eu.citrix.com>

Keir Fraser wrote:
> On 12/01/2010 16:30, "Andre Przywara" <andre.przywara@amd.com> wrote:
> 
>> If we decided to not report memory-less nodes in physinfo we should also
>> skip them in the node_to_{cpu,memory,dma32_mem} Python lists. Currently
>> Xen will not start guests on machines with memory-less nodes which are
>> not the last ones. On an 8-node machine with empty nodes 4 and 5 "xm
>> info" was reporting wrongly, also the node assignment algorithm crashed
>> with a division by zero error.
>> The attached patch fixes this by skipping empty nodes in the enumeration
>> of resources.
> 
> Where to begin? Firstly, I thought that the ordering of nodes in the
> node_to_* lists actually mattered -- the lists are indexed by nodeid (a
> handle which can be passed to other Xen interfaces) are they not? If you
> don't include empty entries, then the index position of entries is no longer
> meaningful.
OK, that seems to be an issue.
To be honest I am not a fan of omitting nodes from physinfo, but that is 
what the current code (RC1!) does and it definitely breaks Xen on my 
box. So I just made this small patch to make it work again.
Actually I would opt to revert the patch cropping the number of nodes 
reported by physinfo (20762:a1d0a575b4ba ?). Yes, that would result in 
nodes reported with zero memory, but in my tests this did not raise 
problems, as a node's memory can (and will) be exhausted even during 
normal operation.
To illustrate the problem:
My box has 8 nodes, I removed the memory from nodes 4 & 5.
With the unpatched version xm info says:
total_memory           : 73712
free_memory            : 70865
node_to_cpu            : node0:0-5,24-35
                          node1:6-11
                          node2:12-17
                          node3:18-23
                          node4:no cpus
                          node5:no cpus
node_to_memory         : node0:14267
                          node1:8167
                          node2:16335
                          node3:8167
                          node4:0
                          node5:0
So this listing completely omits the last two nodes (CPUs 36-47 and the 
24 GB connected to them). The debug key triggered Xen-internal listing 
is correct, though:
(XEN) idx0 -> NODE0 start->0 size->4423680
(XEN) phys_to_nid(0000000000001000) -> 0 should be 0
(XEN) idx1 -> NODE1 start->4423680 size->2097152
(XEN) phys_to_nid(0000000438001000) -> 1 should be 1
(XEN) idx2 -> NODE2 start->6520832 size->4194304
(XEN) phys_to_nid(0000000638001000) -> 2 should be 2
(XEN) idx3 -> NODE3 start->10715136 size->2097152
(XEN) phys_to_nid(0000000a38001000) -> 3 should be 3
(XEN) idx6 -> NODE6 start->12812288 size->4194304
(XEN) phys_to_nid(0000000c38001000) -> 6 should be 6
(XEN) idx7 -> NODE7 start->17006592 size->2097152
(XEN) phys_to_nid(0000001038001000) -> 7 should be 7
With the patched xc.so xm info reports:
node_to_cpu            : node0:0-5,24-35
                          node1:6-11
                          node2:12-17
                          node3:18-23
                          node4:36-41
                          node5:42-47
node_to_memory         : node0:14267
                          node1:8167
                          node2:16335
                          node3:8167
                          node4:16335
                          node5:7590

Although memory less nodes are not very common, it could happen 
sometimes with our new dual-node processor, where one could (even 
accidentally) forget to populate certain memory slots, as it has in fact 
a dual-node dual-channel memory interface.

> Secondly, you avoid appending to the node_to_cpu list if the node is
> cpu-less. But you avoid appending to the node_to_{memory,dma32} lists only
> if the node is *both* cpu-less and memory-less. That's not even consistent.
OK, that's a point. I see that the value of node_exists can change.
> Please just fix the crap Python code.
What part do you exactly mean? The part triggering the division by zero?

I will see if I can fix this properly.

Regards,
Andre.

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448 3567 12
----to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Andrew Bowd; Thomas M. McCoy; Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

  reply	other threads:[~2010-01-13  9:42 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-12 16:30 [PATCH] numa: fix problems with memory-less nodes Andre Przywara
2010-01-13  8:26 ` Keir Fraser
2010-01-13  9:42   ` Andre Przywara [this message]
2010-01-13  9:55     ` Keir Fraser
2010-01-13 10:02     ` Keir Fraser

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B4D9582.5010806@amd.com \
    --to=andre.przywara@amd.com \
    --cc=keir.fraser@eu.citrix.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.