* Spurious "NUMA placement failed, performance might be affected" message on ARM
@ 2014-04-25 10:37 Ian Campbell
2014-04-28 9:55 ` Dario Faggioli
0 siblings, 1 reply; 8+ messages in thread
From: Ian Campbell @ 2014-04-25 10:37 UTC (permalink / raw)
To: Dario Faggioli; +Cc: xen-devel
Hi Dario,
When starting a guest on ARM I'm seeing (with no additional verbosity):
libxl: notice: libxl_numa.c:494:libxl__get_numa_candidate: NUMA placement failed, performance might be affected
Which is a bit strong for a non-NUMA system.
Is there some hypercall we need to stub out or return -ENOSYS from to
cause this function to decide that this is not a NUMA system?
Does the same message occur on non-NUMA x86 systems?
Ian.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Spurious "NUMA placement failed, performance might be affected" message on ARM
2014-04-25 10:37 Spurious "NUMA placement failed, performance might be affected" message on ARM Ian Campbell
@ 2014-04-28 9:55 ` Dario Faggioli
2014-04-28 10:01 ` Ian Campbell
0 siblings, 1 reply; 8+ messages in thread
From: Dario Faggioli @ 2014-04-28 9:55 UTC (permalink / raw)
To: Ian Campbell; +Cc: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 1518 bytes --]
On ven, 2014-04-25 at 11:37 +0100, Ian Campbell wrote:
> Hi Dario,
>
> When starting a guest on ARM I'm seeing (with no additional verbosity):
> libxl: notice: libxl_numa.c:494:libxl__get_numa_candidate: NUMA placement failed, performance might be affected
>
> Which is a bit strong for a non-NUMA system.
>
Indeed.
> Is there some hypercall we need to stub out or return -ENOSYS from to
> cause this function to decide that this is not a NUMA system?
>
> Does the same message occur on non-NUMA x86 systems?
>
The message is printed inside libxl__get_numa_candidate() if no suitable
placement candidate is found. It does not happen on x86 non-NUMA boxes
as what happens there is that there is only 1 node, so the set of
possible combinations of nodes is made up of only one element, which is
deemed to be the best possible solution very quickly.
While I wonder why that does not happen on ARM, a sensible solution
would be to bail earlier, if we find only one NUMA node exist, for
whatever arch. Would that be ok? If yes, I can arrange a patch pretty
easily, I think.
For figuring out why the different behavior... Do you have the output of
`xl info -n' on that box handy, by any chance?
Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Spurious "NUMA placement failed, performance might be affected" message on ARM
2014-04-28 9:55 ` Dario Faggioli
@ 2014-04-28 10:01 ` Ian Campbell
2014-04-28 10:25 ` Dario Faggioli
2014-04-28 11:30 ` Ian Campbell
0 siblings, 2 replies; 8+ messages in thread
From: Ian Campbell @ 2014-04-28 10:01 UTC (permalink / raw)
To: Dario Faggioli; +Cc: xen-devel
On Mon, 2014-04-28 at 11:55 +0200, Dario Faggioli wrote:
> While I wonder why that does not happen on ARM, a sensible solution
> would be to bail earlier, if we find only one NUMA node exist, for
> whatever arch. Would that be ok? If yes, I can arrange a patch pretty
> easily, I think.
I suppose if ARM is also reporting 1 node then for some reason this
check isn't hitting and moving it earlier won't help (not that it would
be a bad idea independently to optimise this a bit).
Perhaps ARM is reporting no NUMA nodes? The xl info -n output suggests
not but where else should I check.
>
> For figuring out why the different behavior... Do you have the output of
> `xl info -n' on that box handy, by any chance?
Here you are:
# xl info -n
host : marilith-n0
release : 3.14.0-arm-native-10322-g0903f4b
version : #6 SMP Thu Apr 17 14:03:36 BST 2014
machine : armv7l
nr_cpus : 4
max_cpu_id : 127
nr_nodes : 1
cores_per_socket : 1
threads_per_core : 1
cpu_mhz : 150
hw_caps : 00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000
virt_caps :
total_memory : 8184
free_memory : 6933
sharing_freed_memory : 0
sharing_used_memory : 0
outstanding_claims : 0
free_cpus : 0
cpu_topology :
cpu: core socket node
0: 0 0 0
1: 0 0 0
2: 0 0 0
3: 0 0 0
numa_info :
node: memsize memfree distances
0: 4088 3861 20
xen_major : 4
xen_minor : 5
xen_extra : -unstable
xen_version : 4.5-unstable
xen_caps : xen-3.0-armv7l
xen_scheduler : credit
xen_pagesize : 4096
platform_params : virt_start=0x200000
xen_changeset : Thu Apr 24 16:14:47 2014 +0100 git:499282e-dirty
xen_commandline : console=dtuart dtuart=/soc/serial@fff36000 dom0_mem=128M noreboot conswitch=x loglvl=all guest_loglvl_all
cc_compiler : gcc (Debian 4.6.3-14) 4.6.3
cc_compile_by : ianc
cc_compile_domain : uk.xensource.com
cc_compile_date : Fri Apr 25 13:38:05 BST 2014
xend_config_format : 4
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Spurious "NUMA placement failed, performance might be affected" message on ARM
2014-04-28 10:01 ` Ian Campbell
@ 2014-04-28 10:25 ` Dario Faggioli
2014-04-28 11:30 ` Ian Campbell
1 sibling, 0 replies; 8+ messages in thread
From: Dario Faggioli @ 2014-04-28 10:25 UTC (permalink / raw)
To: Ian Campbell; +Cc: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 2608 bytes --]
On lun, 2014-04-28 at 11:01 +0100, Ian Campbell wrote:
> On Mon, 2014-04-28 at 11:55 +0200, Dario Faggioli wrote:
> > While I wonder why that does not happen on ARM, a sensible solution
> > would be to bail earlier, if we find only one NUMA node exist, for
> > whatever arch. Would that be ok? If yes, I can arrange a patch pretty
> > easily, I think.
>
> I suppose if ARM is also reporting 1 node then for some reason this
> check isn't hitting and moving it earlier won't help (not that it would
> be a bad idea independently to optimise this a bit).
>
Exactly. Honestly, I thought it was like that already, but I was
evidently mis-remembering. I'll do that as soon as we will have sorted
this out.
> Perhaps ARM is reporting no NUMA nodes? The xl info -n output suggests
> not but where else should I check.
>
The output (which I removed) really looks similar to my non-NUMA x86
box.
As per what to check, in libxl__get_numa_candidate(), if it enters this
loop (and it should):
for (comb_ok = comb_init(gc, &comb_iter, nr_suit_nodes, min_nodes);
comb_ok;
comb_ok = comb_next(comb_iter, nr_suit_nodes, min_nodes)) {
The only reason why it does not get to the end of the first iter (and
set cndt_found to 1) seems to be one of these if-s:
/* If there is not enough memory in this combination, skip it
* and go generating the next one... */
nodes_free_memkb = nodemap_to_free_memkb(ninfo, &nodemap);
if (min_free_memkb && nodes_free_memkb < min_free_memkb)
continue;
/* And the same applies if this combination is short in cpus */
nodes_cpus = nodemap_to_nr_cpus(tinfo, nr_cpus, suitable_cpumap,
&nodemap);
if (min_cpus && nodes_cpus < min_cpus)
continue;
With only 1 node, the first one should really be false, or we shouldn't
be here. The second is, I think, possible, but very unlikely (does your
guest have more vCPUs than the host has pCPUs?).
I'd be happy to have a look myself, but I don't have an ARM (cross)
build environment ready right now... Perhaps this is the chance to get
one, though... Do you want me to? (provided I can access that or a
similar box)
Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Spurious "NUMA placement failed, performance might be affected" message on ARM
2014-04-28 10:01 ` Ian Campbell
2014-04-28 10:25 ` Dario Faggioli
@ 2014-04-28 11:30 ` Ian Campbell
2014-04-28 12:15 ` Andrew Cooper
2014-04-28 12:23 ` Dario Faggioli
1 sibling, 2 replies; 8+ messages in thread
From: Ian Campbell @ 2014-04-28 11:30 UTC (permalink / raw)
To: Dario Faggioli; +Cc: xen-devel
On Mon, 2014-04-28 at 11:01 +0100, Ian Campbell wrote:
[...]
> total_memory : 8184
> free_memory : 6933
[...]
> node: memsize memfree distances
> 0: 4088 3861 20
I think we are missing some RAM here...
and I have now noticed that I only see this for guests which have more
RAM than this memfree value (+/- some slop). The guests succeed because
there is actually RAM available.
I think this is enough for me to now track down the cause on the
hypervisor side. Thanks for your input.
Ian.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Spurious "NUMA placement failed, performance might be affected" message on ARM
2014-04-28 11:30 ` Ian Campbell
@ 2014-04-28 12:15 ` Andrew Cooper
2014-04-29 15:36 ` Dario Faggioli
2014-04-28 12:23 ` Dario Faggioli
1 sibling, 1 reply; 8+ messages in thread
From: Andrew Cooper @ 2014-04-28 12:15 UTC (permalink / raw)
To: Ian Campbell; +Cc: Dario Faggioli, xen-devel
On 28/04/14 12:30, Ian Campbell wrote:
> On Mon, 2014-04-28 at 11:01 +0100, Ian Campbell wrote:
> [...]
>> total_memory : 8184
>> free_memory : 6933
> [...]
>> node: memsize memfree distances
>> 0: 4088 3861 20
> I think we are missing some RAM here...
>
> and I have now noticed that I only see this for guests which have more
> RAM than this memfree value (+/- some slop). The guests succeed because
> there is actually RAM available.
>
> I think this is enough for me to now track down the cause on the
> hypervisor side. Thanks for your input.
>
> Ian.
Do be aware that the memsize value is "number of pages on this node"
which includes IO mappings of non-ram regions, and as a result memfree
is mostly fictitious.
I raised this as a concern with the hwloc code, but without any
subsequent discussion.
~Andrew
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Spurious "NUMA placement failed, performance might be affected" message on ARM
2014-04-28 11:30 ` Ian Campbell
2014-04-28 12:15 ` Andrew Cooper
@ 2014-04-28 12:23 ` Dario Faggioli
1 sibling, 0 replies; 8+ messages in thread
From: Dario Faggioli @ 2014-04-28 12:23 UTC (permalink / raw)
To: Ian Campbell; +Cc: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 1181 bytes --]
On lun, 2014-04-28 at 12:30 +0100, Ian Campbell wrote:
> On Mon, 2014-04-28 at 11:01 +0100, Ian Campbell wrote:
> [...]
> > total_memory : 8184
> > free_memory : 6933
> [...]
> > node: memsize memfree distances
> > 0: 4088 3861 20
>
> I think we are missing some RAM here...
>
> and I have now noticed that I only see this for guests which have more
> RAM than this memfree value (+/- some slop). The guests succeed because
> there is actually RAM available.
>
Oh, yes, that's the first 'if' being true then.
Anyway, I'll go ahead and make a patch for skipping all this in case
there is only 1 node.
> I think this is enough for me to now track down the cause on the
> hypervisor side.
>
Indeed. From what you say, it looks like XEN_SYSCTL_numainfo is 'hiding'
something on ARM...
> Thanks for your input.
>
:-)
Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Spurious "NUMA placement failed, performance might be affected" message on ARM
2014-04-28 12:15 ` Andrew Cooper
@ 2014-04-29 15:36 ` Dario Faggioli
0 siblings, 0 replies; 8+ messages in thread
From: Dario Faggioli @ 2014-04-29 15:36 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Ian Campbell, xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 1632 bytes --]
On lun, 2014-04-28 at 13:15 +0100, Andrew Cooper wrote:
> On 28/04/14 12:30, Ian Campbell wrote:
> > On Mon, 2014-04-28 at 11:01 +0100, Ian Campbell wrote:
> > [...]
> >> total_memory : 8184
> >> free_memory : 6933
> > [...]
> >> node: memsize memfree distances
> >> 0: 4088 3861 20
> > I think we are missing some RAM here...
> >
> > and I have now noticed that I only see this for guests which have more
> > RAM than this memfree value (+/- some slop). The guests succeed because
> > there is actually RAM available.
> >
> > I think this is enough for me to now track down the cause on the
> > hypervisor side. Thanks for your input.
> >
> > Ian.
>
> Do be aware that the memsize value is "number of pages on this node"
> which includes IO mappings of non-ram regions, and as a result memfree
> is mostly fictitious.
>
> I raised this as a concern with the hwloc code, but without any
> subsequent discussion.
>
There has been another discussion about discrepancies in total and free
memory reporting. ISTR, it was noticed at the time when I added the
support for some of these NUMA stuff in libvirt.
I don't have it handy right now... I'll see if I can fetch it back,
merge it with your observations in the hwloc thread and (re)start a
discussion about this.
Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2014-04-29 15:36 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-25 10:37 Spurious "NUMA placement failed, performance might be affected" message on ARM Ian Campbell
2014-04-28 9:55 ` Dario Faggioli
2014-04-28 10:01 ` Ian Campbell
2014-04-28 10:25 ` Dario Faggioli
2014-04-28 11:30 ` Ian Campbell
2014-04-28 12:15 ` Andrew Cooper
2014-04-29 15:36 ` Dario Faggioli
2014-04-28 12:23 ` Dario Faggioli
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).