From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>,
"Keir (Xen.org)" <keir@xen.org>,
"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: Logical NUMA error during boot, and RFC patch
Date: Thu, 28 Jun 2012 11:29:31 +0100 [thread overview]
Message-ID: <4FEC320B.7050309@citrix.com> (raw)
In-Reply-To: <4FEC454C020000780008C6A4@nat28.tlf.novell.com>
On 28/06/12 10:51, Jan Beulich wrote:
>>>> On 27.06.12 at 21:10, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> XenServer have recently acquired a quad-socket AMD Interlagos server and
>> I have been playing around with it, and discovered a logical error in
>> how Xen detects numa nodes.
>>
>> The server has 8 NUMA nodes, 4 of which have memory attached (the even
>> nodes - see SRAT.dsl attached). This means that that
>> node_set_online(nodeid) gets called for each node with memory attached.
>> Later, in srat_detect_node(), node gets set to 0 if it was NUMA_NO_NODE,
>> or if not node_online(). This leads to all the processors on the odd
>> nodes being assigned to node 0, even though the odd nodes are present
>> (see interlagos-xl-info-n.log)
>>
>> I present an RFC patch which changes srat_detect_node() to call
>> node_set_online() for each node, which appears to fix the logic.
>>
>> Is this a sensible place to set the node online, or is there a better
>> way to fix this logic?
> While the place looks sensible, it has the possible problem of
> potentially adding bits to the online map pretty late in the game.
>
> As the memory-related invocations of node_set_online() come
> out of numa_initmem_init()/acpi_scan_nodes(), perhaps the
> (boot time) CPU-related ones should be done there too (I'd
> still keep the adjustment you're already doing, to also cover
> hotplug CPUs)?
>
> Jan
>
I have been doing quite a bit more testing this morning, and have come
to some sad conclusions.
This specific server is a Dell R815 loner, with 8x4GiB DIMMs, with 2
DIMMs hanging off each socket. As each socket is an interlagos
processor, there are 4 memory controllers (with 8 DIMM slots as they are
dual channel)
What this means is that per socket, one node has half of its available
DIMMs filled, and the other node has no memory. The performance
implications are severe, but as it appears that almost all combinations
of RAM you can select on the Dell website will leads to poor or worse
performance, I can foresee many systems like this in the future. (I
don't wish to single Dell out here, other than it happens to be the
provider of the server I am testing. Other server providers suffer the
same issues)
As to the problem at hand, I will investigate the numa code some more
and see about setting it up earlier.
--
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com
next prev parent reply other threads:[~2012-06-28 10:29 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-27 19:10 Logical NUMA error during boot, and RFC patch Andrew Cooper
2012-06-28 9:51 ` Jan Beulich
2012-06-28 10:29 ` Andrew Cooper [this message]
2012-06-28 12:29 ` George Dunlap
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FEC320B.7050309@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=JBeulich@suse.com \
--cc=dario.faggioli@citrix.com \
--cc=keir@xen.org \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).