From: Dario Faggioli <dario.faggioli@citrix.com>
To: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Cc: Elena Ufimtseva <elena.ufimtseva@oracle.com>,
Wei Liu <Wei.Liu2@citrix.com>,
Andrew Cooper <Andrew.Cooper3@citrix.com>,
David Vrabel <david.vrabel@citrix.com>,
Jan Beulich <JBeulich@suse.com>,
Boris Ostrovsky <boris.ostrovsky@oracle.com>
Subject: PV-vNUMA issue: topology is misinterpreted by the guest
Date: Thu, 16 Jul 2015 12:32:42 +0200 [thread overview]
Message-ID: <1437042762.28251.18.camel@citrix.com> (raw)
[-- Attachment #1.1: Type: text/plain, Size: 3987 bytes --]
Hey,
This started on IRC, but it's actually appropriate to have the
conversation here.
I just discovered an issue with vNUMA, when PV guests are used. In fact,
creating a 4 vCPUs PV guest, and making up things so that all the 4
vCPUs should be busy, I see this:
root@Zhaman:~# xl vcpu-list test
Name ID VCPU CPU State Time(s) Affinity (Hard / Soft)
test 4 0 5 r-- 1481.9 all / 0-7
test 4 1 2 r-- 1479.4 all / 0-7
test 4 2 15 -b- 7.5 all / 8-15
test 4 3 10 -b- 1324.8 all / 8-15
Going checking inside the guest, confirms that *everything* runs on
vCPUs 0 and 1. However, using schedtool or taskset, I can force tasks to
execute on vCPUs 2 and 3.
Inspecting the guest's dmesg, I've seen this:
[ 0.128416] ------------[ cut here ]------------
[ 0.128416] WARNING: CPU: 2 PID: 0 at ../arch/x86/kernel/smpboot.c:317 topology_sane.isra.2+0x74/0x88()
[ 0.128416] sched: CPU #2's smt-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
[ 0.128416] Modules linked in:
[ 0.128416] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.19.0+ #1
[ 0.128416] 0000000000000009 ffff88001ee3bdd0 ffffffff81657c7b ffffffff810bbd2c
[ 0.128416] ffff88001ee3be20 ffff88001ee3be10 ffffffff81081510 ffff88001ee3bea0
[ 0.128416] ffffffff8103aa02 ffff88003ea0a001 0000000000000000 ffff88001f20a040
[ 0.128416] Call Trace:
[ 0.128416] [<ffffffff81657c7b>] dump_stack+0x4f/0x7b
[ 0.128416] [<ffffffff810bbd2c>] ? up+0x39/0x3e
[ 0.128416] [<ffffffff81081510>] warn_slowpath_common+0xa1/0xbb
[ 0.128416] [<ffffffff8103aa02>] ? topology_sane.isra.2+0x74/0x88
[ 0.128416] [<ffffffff81081570>] warn_slowpath_fmt+0x46/0x48
[ 0.128416] [<ffffffff8101eeb1>] ? __cpuid.constprop.0+0x15/0x19
[ 0.128416] [<ffffffff8103aa02>] topology_sane.isra.2+0x74/0x88
[ 0.128416] [<ffffffff8103ac70>] set_cpu_sibling_map+0x21a/0x444
[ 0.128416] [<ffffffff81056ac3>] ? numa_add_cpu+0x98/0x9f
[ 0.128416] [<ffffffff8100b8f2>] cpu_bringup+0x63/0xa8
[ 0.128416] [<ffffffff8100b945>] cpu_bringup_and_idle+0xe/0x1a
[ 0.128416] ---[ end trace 95bff1aef57ee1b1 ]---
So, basically, Linux is complaining that we're trying to put two vCPUs,
that looks to be SMT siblings, on different NUMA nodes. And, yes, I
think this is quite disruptive for the Linux's scheduler internal logic.
The vnuma bits of the guest config are these:
vnuma = [ [ "pnode=0","size=512","vcpus=0-1","vdistances=10,20" ],
[ "pnode=1","size=512","vcpus=2-3","vdistances=20,10" ] ]
From inside the guest, the topology looks to be like this:
root@test:~# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1
node 0 size: 475 MB
node 0 free: 382 MB
node 1 cpus: 2 3
node 1 size: 495 MB
node 1 free: 475 MB
node distances:
node 0 1
0: 10 10
1: 20 10
root@test:~# cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list
0-1
root@test:~# cat /sys/devices/system/cpu/cpu0/topology/core_siblings_list
0-3
root@test:~# cat /sys/devices/system/cpu/cpu2/topology/thread_siblings_list
2-3
root@test:~# cat /sys/devices/system/cpu/cpu2/topology/core_siblings_list
0-3
So the complain during boot seems to be against 'core_siblings' (which
was not what I expected, but perhaps I misremember the meaning of
"core_siblings" VS. "thread_siblings" VS. smt-siblings in Linux; I'll
double check).
Anyway, is there anything we can do to fix or workaround things?
Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next reply other threads:[~2015-07-16 10:33 UTC|newest]
Thread overview: 95+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-16 10:32 Dario Faggioli [this message]
2015-07-16 10:47 ` PV-vNUMA issue: topology is misinterpreted by the guest Jan Beulich
2015-07-16 10:56 ` Andrew Cooper
2015-07-16 15:25 ` Wei Liu
2015-07-16 15:45 ` Andrew Cooper
2015-07-16 15:50 ` Boris Ostrovsky
2015-07-16 16:29 ` Jan Beulich
2015-07-16 16:39 ` Andrew Cooper
2015-07-16 16:59 ` Boris Ostrovsky
2015-07-17 6:09 ` Jan Beulich
2015-07-17 7:27 ` Dario Faggioli
2015-07-17 7:42 ` Jan Beulich
2015-07-17 8:44 ` Wei Liu
2015-07-17 18:17 ` Boris Ostrovsky
2015-07-20 14:09 ` Dario Faggioli
2015-07-20 14:43 ` Boris Ostrovsky
2015-07-21 20:00 ` Boris Ostrovsky
2015-07-22 13:36 ` Dario Faggioli
2015-07-22 13:50 ` Juergen Gross
2015-07-22 13:58 ` Boris Ostrovsky
2015-07-22 14:09 ` Juergen Gross
2015-07-22 14:44 ` Boris Ostrovsky
2015-07-23 4:43 ` Juergen Gross
2015-07-23 7:28 ` Jan Beulich
2015-07-23 9:42 ` Andrew Cooper
2015-07-23 14:07 ` Dario Faggioli
2015-07-23 14:13 ` Juergen Gross
2015-07-24 10:28 ` Juergen Gross
2015-07-24 14:44 ` Dario Faggioli
2015-07-24 15:14 ` Juergen Gross
2015-07-24 15:24 ` Juergen Gross
2015-07-24 15:58 ` Dario Faggioli
2015-07-24 16:09 ` Konrad Rzeszutek Wilk
2015-07-24 16:14 ` Dario Faggioli
2015-07-24 16:18 ` Juergen Gross
2015-07-24 16:29 ` Konrad Rzeszutek Wilk
2015-07-24 16:39 ` Juergen Gross
2015-07-24 16:44 ` Boris Ostrovsky
2015-07-27 4:35 ` Juergen Gross
2015-07-27 10:43 ` George Dunlap
2015-07-27 10:54 ` Andrew Cooper
2015-07-27 11:13 ` Juergen Gross
2015-07-27 10:54 ` Juergen Gross
2015-07-27 11:11 ` George Dunlap
2015-07-27 12:01 ` Juergen Gross
2015-07-27 12:16 ` Tim Deegan
2015-07-27 13:23 ` Dario Faggioli
2015-07-27 14:02 ` Juergen Gross
2015-07-27 14:02 ` Dario Faggioli
2015-07-27 10:41 ` George Dunlap
2015-07-27 10:49 ` Andrew Cooper
2015-07-27 13:11 ` Dario Faggioli
2015-07-24 16:10 ` Juergen Gross
2015-07-24 16:40 ` Boris Ostrovsky
2015-07-24 16:48 ` Juergen Gross
2015-07-24 17:11 ` Boris Ostrovsky
2015-07-27 13:40 ` Dario Faggioli
2015-07-27 4:24 ` Juergen Gross
2015-07-27 14:09 ` Dario Faggioli
2015-07-27 14:34 ` Boris Ostrovsky
2015-07-27 14:43 ` Juergen Gross
2015-07-27 14:51 ` Boris Ostrovsky
2015-07-27 15:03 ` Juergen Gross
2015-07-27 14:47 ` Juergen Gross
2015-07-27 14:58 ` Dario Faggioli
2015-07-28 4:29 ` Juergen Gross
2015-07-28 15:11 ` Juergen Gross
2015-07-28 16:17 ` Dario Faggioli
2015-07-28 17:13 ` Dario Faggioli
2015-07-29 6:04 ` Juergen Gross
2015-07-29 7:09 ` Dario Faggioli
2015-07-29 7:44 ` Dario Faggioli
2015-07-24 16:05 ` Dario Faggioli
2015-07-28 10:05 ` Wei Liu
2015-07-28 15:17 ` Dario Faggioli
2015-07-24 20:27 ` Elena Ufimtseva
2015-07-22 14:50 ` Dario Faggioli
2015-07-22 15:32 ` Boris Ostrovsky
2015-07-22 15:49 ` Dario Faggioli
2015-07-22 18:10 ` Boris Ostrovsky
2015-07-23 7:25 ` Jan Beulich
2015-07-24 16:03 ` Boris Ostrovsky
2015-07-23 13:46 ` Dario Faggioli
2015-07-17 10:17 ` Andrew Cooper
2015-07-16 15:26 ` Wei Liu
2015-07-27 15:13 ` David Vrabel
2015-07-27 16:02 ` Dario Faggioli
2015-07-27 16:31 ` David Vrabel
2015-07-27 16:33 ` Andrew Cooper
2015-07-27 17:42 ` Dario Faggioli
2015-07-27 17:50 ` Konrad Rzeszutek Wilk
2015-07-27 23:19 ` Andrew Cooper
2015-07-28 3:52 ` Juergen Gross
2015-07-28 9:40 ` Andrew Cooper
2015-07-28 9:28 ` Dario Faggioli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1437042762.28251.18.camel@citrix.com \
--to=dario.faggioli@citrix.com \
--cc=Andrew.Cooper3@citrix.com \
--cc=JBeulich@suse.com \
--cc=Wei.Liu2@citrix.com \
--cc=boris.ostrovsky@oracle.com \
--cc=david.vrabel@citrix.com \
--cc=elena.ufimtseva@oracle.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).