* Spurious "NUMA placement failed, performance might be affected" message on ARM @ 2014-04-25 10:37 Ian Campbell 2014-04-28 9:55 ` Dario Faggioli 0 siblings, 1 reply; 8+ messages in thread From: Ian Campbell @ 2014-04-25 10:37 UTC (permalink / raw) To: Dario Faggioli; +Cc: xen-devel Hi Dario, When starting a guest on ARM I'm seeing (with no additional verbosity): libxl: notice: libxl_numa.c:494:libxl__get_numa_candidate: NUMA placement failed, performance might be affected Which is a bit strong for a non-NUMA system. Is there some hypercall we need to stub out or return -ENOSYS from to cause this function to decide that this is not a NUMA system? Does the same message occur on non-NUMA x86 systems? Ian. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Spurious "NUMA placement failed, performance might be affected" message on ARM 2014-04-25 10:37 Spurious "NUMA placement failed, performance might be affected" message on ARM Ian Campbell @ 2014-04-28 9:55 ` Dario Faggioli 2014-04-28 10:01 ` Ian Campbell 0 siblings, 1 reply; 8+ messages in thread From: Dario Faggioli @ 2014-04-28 9:55 UTC (permalink / raw) To: Ian Campbell; +Cc: xen-devel [-- Attachment #1.1: Type: text/plain, Size: 1518 bytes --] On ven, 2014-04-25 at 11:37 +0100, Ian Campbell wrote: > Hi Dario, > > When starting a guest on ARM I'm seeing (with no additional verbosity): > libxl: notice: libxl_numa.c:494:libxl__get_numa_candidate: NUMA placement failed, performance might be affected > > Which is a bit strong for a non-NUMA system. > Indeed. > Is there some hypercall we need to stub out or return -ENOSYS from to > cause this function to decide that this is not a NUMA system? > > Does the same message occur on non-NUMA x86 systems? > The message is printed inside libxl__get_numa_candidate() if no suitable placement candidate is found. It does not happen on x86 non-NUMA boxes as what happens there is that there is only 1 node, so the set of possible combinations of nodes is made up of only one element, which is deemed to be the best possible solution very quickly. While I wonder why that does not happen on ARM, a sensible solution would be to bail earlier, if we find only one NUMA node exist, for whatever arch. Would that be ok? If yes, I can arrange a patch pretty easily, I think. For figuring out why the different behavior... Do you have the output of `xl info -n' on that box handy, by any chance? Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 198 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Spurious "NUMA placement failed, performance might be affected" message on ARM 2014-04-28 9:55 ` Dario Faggioli @ 2014-04-28 10:01 ` Ian Campbell 2014-04-28 10:25 ` Dario Faggioli 2014-04-28 11:30 ` Ian Campbell 0 siblings, 2 replies; 8+ messages in thread From: Ian Campbell @ 2014-04-28 10:01 UTC (permalink / raw) To: Dario Faggioli; +Cc: xen-devel On Mon, 2014-04-28 at 11:55 +0200, Dario Faggioli wrote: > While I wonder why that does not happen on ARM, a sensible solution > would be to bail earlier, if we find only one NUMA node exist, for > whatever arch. Would that be ok? If yes, I can arrange a patch pretty > easily, I think. I suppose if ARM is also reporting 1 node then for some reason this check isn't hitting and moving it earlier won't help (not that it would be a bad idea independently to optimise this a bit). Perhaps ARM is reporting no NUMA nodes? The xl info -n output suggests not but where else should I check. > > For figuring out why the different behavior... Do you have the output of > `xl info -n' on that box handy, by any chance? Here you are: # xl info -n host : marilith-n0 release : 3.14.0-arm-native-10322-g0903f4b version : #6 SMP Thu Apr 17 14:03:36 BST 2014 machine : armv7l nr_cpus : 4 max_cpu_id : 127 nr_nodes : 1 cores_per_socket : 1 threads_per_core : 1 cpu_mhz : 150 hw_caps : 00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000 virt_caps : total_memory : 8184 free_memory : 6933 sharing_freed_memory : 0 sharing_used_memory : 0 outstanding_claims : 0 free_cpus : 0 cpu_topology : cpu: core socket node 0: 0 0 0 1: 0 0 0 2: 0 0 0 3: 0 0 0 numa_info : node: memsize memfree distances 0: 4088 3861 20 xen_major : 4 xen_minor : 5 xen_extra : -unstable xen_version : 4.5-unstable xen_caps : xen-3.0-armv7l xen_scheduler : credit xen_pagesize : 4096 platform_params : virt_start=0x200000 xen_changeset : Thu Apr 24 16:14:47 2014 +0100 git:499282e-dirty xen_commandline : console=dtuart dtuart=/soc/serial@fff36000 dom0_mem=128M noreboot conswitch=x loglvl=all guest_loglvl_all cc_compiler : gcc (Debian 4.6.3-14) 4.6.3 cc_compile_by : ianc cc_compile_domain : uk.xensource.com cc_compile_date : Fri Apr 25 13:38:05 BST 2014 xend_config_format : 4 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Spurious "NUMA placement failed, performance might be affected" message on ARM 2014-04-28 10:01 ` Ian Campbell @ 2014-04-28 10:25 ` Dario Faggioli 2014-04-28 11:30 ` Ian Campbell 1 sibling, 0 replies; 8+ messages in thread From: Dario Faggioli @ 2014-04-28 10:25 UTC (permalink / raw) To: Ian Campbell; +Cc: xen-devel [-- Attachment #1.1: Type: text/plain, Size: 2608 bytes --] On lun, 2014-04-28 at 11:01 +0100, Ian Campbell wrote: > On Mon, 2014-04-28 at 11:55 +0200, Dario Faggioli wrote: > > While I wonder why that does not happen on ARM, a sensible solution > > would be to bail earlier, if we find only one NUMA node exist, for > > whatever arch. Would that be ok? If yes, I can arrange a patch pretty > > easily, I think. > > I suppose if ARM is also reporting 1 node then for some reason this > check isn't hitting and moving it earlier won't help (not that it would > be a bad idea independently to optimise this a bit). > Exactly. Honestly, I thought it was like that already, but I was evidently mis-remembering. I'll do that as soon as we will have sorted this out. > Perhaps ARM is reporting no NUMA nodes? The xl info -n output suggests > not but where else should I check. > The output (which I removed) really looks similar to my non-NUMA x86 box. As per what to check, in libxl__get_numa_candidate(), if it enters this loop (and it should): for (comb_ok = comb_init(gc, &comb_iter, nr_suit_nodes, min_nodes); comb_ok; comb_ok = comb_next(comb_iter, nr_suit_nodes, min_nodes)) { The only reason why it does not get to the end of the first iter (and set cndt_found to 1) seems to be one of these if-s: /* If there is not enough memory in this combination, skip it * and go generating the next one... */ nodes_free_memkb = nodemap_to_free_memkb(ninfo, &nodemap); if (min_free_memkb && nodes_free_memkb < min_free_memkb) continue; /* And the same applies if this combination is short in cpus */ nodes_cpus = nodemap_to_nr_cpus(tinfo, nr_cpus, suitable_cpumap, &nodemap); if (min_cpus && nodes_cpus < min_cpus) continue; With only 1 node, the first one should really be false, or we shouldn't be here. The second is, I think, possible, but very unlikely (does your guest have more vCPUs than the host has pCPUs?). I'd be happy to have a look myself, but I don't have an ARM (cross) build environment ready right now... Perhaps this is the chance to get one, though... Do you want me to? (provided I can access that or a similar box) Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 198 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Spurious "NUMA placement failed, performance might be affected" message on ARM 2014-04-28 10:01 ` Ian Campbell 2014-04-28 10:25 ` Dario Faggioli @ 2014-04-28 11:30 ` Ian Campbell 2014-04-28 12:15 ` Andrew Cooper 2014-04-28 12:23 ` Dario Faggioli 1 sibling, 2 replies; 8+ messages in thread From: Ian Campbell @ 2014-04-28 11:30 UTC (permalink / raw) To: Dario Faggioli; +Cc: xen-devel On Mon, 2014-04-28 at 11:01 +0100, Ian Campbell wrote: [...] > total_memory : 8184 > free_memory : 6933 [...] > node: memsize memfree distances > 0: 4088 3861 20 I think we are missing some RAM here... and I have now noticed that I only see this for guests which have more RAM than this memfree value (+/- some slop). The guests succeed because there is actually RAM available. I think this is enough for me to now track down the cause on the hypervisor side. Thanks for your input. Ian. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Spurious "NUMA placement failed, performance might be affected" message on ARM 2014-04-28 11:30 ` Ian Campbell @ 2014-04-28 12:15 ` Andrew Cooper 2014-04-29 15:36 ` Dario Faggioli 2014-04-28 12:23 ` Dario Faggioli 1 sibling, 1 reply; 8+ messages in thread From: Andrew Cooper @ 2014-04-28 12:15 UTC (permalink / raw) To: Ian Campbell; +Cc: Dario Faggioli, xen-devel On 28/04/14 12:30, Ian Campbell wrote: > On Mon, 2014-04-28 at 11:01 +0100, Ian Campbell wrote: > [...] >> total_memory : 8184 >> free_memory : 6933 > [...] >> node: memsize memfree distances >> 0: 4088 3861 20 > I think we are missing some RAM here... > > and I have now noticed that I only see this for guests which have more > RAM than this memfree value (+/- some slop). The guests succeed because > there is actually RAM available. > > I think this is enough for me to now track down the cause on the > hypervisor side. Thanks for your input. > > Ian. Do be aware that the memsize value is "number of pages on this node" which includes IO mappings of non-ram regions, and as a result memfree is mostly fictitious. I raised this as a concern with the hwloc code, but without any subsequent discussion. ~Andrew ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Spurious "NUMA placement failed, performance might be affected" message on ARM 2014-04-28 12:15 ` Andrew Cooper @ 2014-04-29 15:36 ` Dario Faggioli 0 siblings, 0 replies; 8+ messages in thread From: Dario Faggioli @ 2014-04-29 15:36 UTC (permalink / raw) To: Andrew Cooper; +Cc: Ian Campbell, xen-devel [-- Attachment #1.1: Type: text/plain, Size: 1632 bytes --] On lun, 2014-04-28 at 13:15 +0100, Andrew Cooper wrote: > On 28/04/14 12:30, Ian Campbell wrote: > > On Mon, 2014-04-28 at 11:01 +0100, Ian Campbell wrote: > > [...] > >> total_memory : 8184 > >> free_memory : 6933 > > [...] > >> node: memsize memfree distances > >> 0: 4088 3861 20 > > I think we are missing some RAM here... > > > > and I have now noticed that I only see this for guests which have more > > RAM than this memfree value (+/- some slop). The guests succeed because > > there is actually RAM available. > > > > I think this is enough for me to now track down the cause on the > > hypervisor side. Thanks for your input. > > > > Ian. > > Do be aware that the memsize value is "number of pages on this node" > which includes IO mappings of non-ram regions, and as a result memfree > is mostly fictitious. > > I raised this as a concern with the hwloc code, but without any > subsequent discussion. > There has been another discussion about discrepancies in total and free memory reporting. ISTR, it was noticed at the time when I added the support for some of these NUMA stuff in libvirt. I don't have it handy right now... I'll see if I can fetch it back, merge it with your observations in the hwloc thread and (re)start a discussion about this. Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 198 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Spurious "NUMA placement failed, performance might be affected" message on ARM 2014-04-28 11:30 ` Ian Campbell 2014-04-28 12:15 ` Andrew Cooper @ 2014-04-28 12:23 ` Dario Faggioli 1 sibling, 0 replies; 8+ messages in thread From: Dario Faggioli @ 2014-04-28 12:23 UTC (permalink / raw) To: Ian Campbell; +Cc: xen-devel [-- Attachment #1.1: Type: text/plain, Size: 1181 bytes --] On lun, 2014-04-28 at 12:30 +0100, Ian Campbell wrote: > On Mon, 2014-04-28 at 11:01 +0100, Ian Campbell wrote: > [...] > > total_memory : 8184 > > free_memory : 6933 > [...] > > node: memsize memfree distances > > 0: 4088 3861 20 > > I think we are missing some RAM here... > > and I have now noticed that I only see this for guests which have more > RAM than this memfree value (+/- some slop). The guests succeed because > there is actually RAM available. > Oh, yes, that's the first 'if' being true then. Anyway, I'll go ahead and make a patch for skipping all this in case there is only 1 node. > I think this is enough for me to now track down the cause on the > hypervisor side. > Indeed. From what you say, it looks like XEN_SYSCTL_numainfo is 'hiding' something on ARM... > Thanks for your input. > :-) Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 198 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2014-04-29 15:36 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-04-25 10:37 Spurious "NUMA placement failed, performance might be affected" message on ARM Ian Campbell 2014-04-28 9:55 ` Dario Faggioli 2014-04-28 10:01 ` Ian Campbell 2014-04-28 10:25 ` Dario Faggioli 2014-04-28 11:30 ` Ian Campbell 2014-04-28 12:15 ` Andrew Cooper 2014-04-29 15:36 ` Dario Faggioli 2014-04-28 12:23 ` Dario Faggioli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).