From: Dario Faggioli <dario.faggioli@citrix.com>
To: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Cc: Elena Ufimtseva <elena.ufimtseva@oracle.com>,
Wei Liu <Wei.Liu2@citrix.com>,
Andrew Cooper <Andrew.Cooper3@citrix.com>,
David Vrabel <david.vrabel@citrix.com>,
Jan Beulich <JBeulich@suse.com>,
Boris Ostrovsky <boris.ostrovsky@oracle.com>
Subject: PV-vNUMA issue: topology is misinterpreted by the guest
Date: Thu, 16 Jul 2015 12:32:42 +0200 [thread overview]
Message-ID: <1437042762.28251.18.camel@citrix.com> (raw)
[-- Attachment #1.1: Type: text/plain, Size: 3987 bytes --]
Hey,
This started on IRC, but it's actually appropriate to have the
conversation here.
I just discovered an issue with vNUMA, when PV guests are used. In fact,
creating a 4 vCPUs PV guest, and making up things so that all the 4
vCPUs should be busy, I see this:
root@Zhaman:~# xl vcpu-list test
Name ID VCPU CPU State Time(s) Affinity (Hard / Soft)
test 4 0 5 r-- 1481.9 all / 0-7
test 4 1 2 r-- 1479.4 all / 0-7
test 4 2 15 -b- 7.5 all / 8-15
test 4 3 10 -b- 1324.8 all / 8-15
Going checking inside the guest, confirms that *everything* runs on
vCPUs 0 and 1. However, using schedtool or taskset, I can force tasks to
execute on vCPUs 2 and 3.
Inspecting the guest's dmesg, I've seen this:
[ 0.128416] ------------[ cut here ]------------
[ 0.128416] WARNING: CPU: 2 PID: 0 at ../arch/x86/kernel/smpboot.c:317 topology_sane.isra.2+0x74/0x88()
[ 0.128416] sched: CPU #2's smt-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
[ 0.128416] Modules linked in:
[ 0.128416] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.19.0+ #1
[ 0.128416] 0000000000000009 ffff88001ee3bdd0 ffffffff81657c7b ffffffff810bbd2c
[ 0.128416] ffff88001ee3be20 ffff88001ee3be10 ffffffff81081510 ffff88001ee3bea0
[ 0.128416] ffffffff8103aa02 ffff88003ea0a001 0000000000000000 ffff88001f20a040
[ 0.128416] Call Trace:
[ 0.128416] [<ffffffff81657c7b>] dump_stack+0x4f/0x7b
[ 0.128416] [<ffffffff810bbd2c>] ? up+0x39/0x3e
[ 0.128416] [<ffffffff81081510>] warn_slowpath_common+0xa1/0xbb
[ 0.128416] [<ffffffff8103aa02>] ? topology_sane.isra.2+0x74/0x88
[ 0.128416] [<ffffffff81081570>] warn_slowpath_fmt+0x46/0x48
[ 0.128416] [<ffffffff8101eeb1>] ? __cpuid.constprop.0+0x15/0x19
[ 0.128416] [<ffffffff8103aa02>] topology_sane.isra.2+0x74/0x88
[ 0.128416] [<ffffffff8103ac70>] set_cpu_sibling_map+0x21a/0x444
[ 0.128416] [<ffffffff81056ac3>] ? numa_add_cpu+0x98/0x9f
[ 0.128416] [<ffffffff8100b8f2>] cpu_bringup+0x63/0xa8
[ 0.128416] [<ffffffff8100b945>] cpu_bringup_and_idle+0xe/0x1a
[ 0.128416] ---[ end trace 95bff1aef57ee1b1 ]---
So, basically, Linux is complaining that we're trying to put two vCPUs,
that looks to be SMT siblings, on different NUMA nodes. And, yes, I
think this is quite disruptive for the Linux's scheduler internal logic.
The vnuma bits of the guest config are these:
vnuma = [ [ "pnode=0","size=512","vcpus=0-1","vdistances=10,20" ],
[ "pnode=1","size=512","vcpus=2-3","vdistances=20,10" ] ]
From inside the guest, the topology looks to be like this:
root@test:~# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1
node 0 size: 475 MB
node 0 free: 382 MB
node 1 cpus: 2 3
node 1 size: 495 MB
node 1 free: 475 MB
node distances:
node 0 1
0: 10 10
1: 20 10
root@test:~# cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list
0-1
root@test:~# cat /sys/devices/system/cpu/cpu0/topology/core_siblings_list
0-3
root@test:~# cat /sys/devices/system/cpu/cpu2/topology/thread_siblings_list
2-3
root@test:~# cat /sys/devices/system/cpu/cpu2/topology/core_siblings_list
0-3
So the complain during boot seems to be against 'core_siblings' (which
was not what I expected, but perhaps I misremember the meaning of
"core_siblings" VS. "thread_siblings" VS. smt-siblings in Linux; I'll
double check).
Anyway, is there anything we can do to fix or workaround things?
Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next reply other threads:[~2015-07-16 10:33 UTC|newest]
Thread overview: 95+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-16 10:32 Dario Faggioli [this message]
2015-07-16 10:47 ` PV-vNUMA issue: topology is misinterpreted by the guest Jan Beulich
2015-07-16 10:56 ` Andrew Cooper
2015-07-16 15:25 ` Wei Liu
2015-07-16 15:45 ` Andrew Cooper
2015-07-16 15:50 ` Boris Ostrovsky
2015-07-16 16:29 ` Jan Beulich
2015-07-16 16:39 ` Andrew Cooper
2015-07-16 16:59 ` Boris Ostrovsky
2015-07-17 6:09 ` Jan Beulich
2015-07-17 7:27 ` Dario Faggioli
2015-07-17 7:42 ` Jan Beulich
2015-07-17 8:44 ` Wei Liu
2015-07-17 18:17 ` Boris Ostrovsky
2015-07-20 14:09 ` Dario Faggioli
2015-07-20 14:43 ` Boris Ostrovsky
2015-07-21 20:00 ` Boris Ostrovsky
2015-07-22 13:36 ` Dario Faggioli
2015-07-22 13:50 ` Juergen Gross
2015-07-22 13:58 ` Boris Ostrovsky
2015-07-22 14:09 ` Juergen Gross
2015-07-22 14:44 ` Boris Ostrovsky
2015-07-23 4:43 ` Juergen Gross
2015-07-23 7:28 ` Jan Beulich
2015-07-23 9:42 ` Andrew Cooper
2015-07-23 14:07 ` Dario Faggioli
2015-07-23 14:13 ` Juergen Gross
2015-07-24 10:28 ` Juergen Gross
2015-07-24 14:44 ` Dario Faggioli
2015-07-24 15:14 ` Juergen Gross
2015-07-24 15:24 ` Juergen Gross
2015-07-24 15:58 ` Dario Faggioli
2015-07-24 16:09 ` Konrad Rzeszutek Wilk
2015-07-24 16:14 ` Dario Faggioli
2015-07-24 16:18 ` Juergen Gross
2015-07-24 16:29 ` Konrad Rzeszutek Wilk
2015-07-24 16:39 ` Juergen Gross
2015-07-24 16:44 ` Boris Ostrovsky
2015-07-27 4:35 ` Juergen Gross
2015-07-27 10:43 ` George Dunlap
2015-07-27 10:54 ` Andrew Cooper
2015-07-27 11:13 ` Juergen Gross
2015-07-27 10:54 ` Juergen Gross
2015-07-27 11:11 ` George Dunlap
2015-07-27 12:01 ` Juergen Gross
2015-07-27 12:16 ` Tim Deegan
2015-07-27 13:23 ` Dario Faggioli
2015-07-27 14:02 ` Juergen Gross
2015-07-27 14:02 ` Dario Faggioli
2015-07-27 10:41 ` George Dunlap
2015-07-27 10:49 ` Andrew Cooper
2015-07-27 13:11 ` Dario Faggioli
2015-07-24 16:10 ` Juergen Gross
2015-07-24 16:40 ` Boris Ostrovsky
2015-07-24 16:48 ` Juergen Gross
2015-07-24 17:11 ` Boris Ostrovsky
2015-07-27 13:40 ` Dario Faggioli
2015-07-27 4:24 ` Juergen Gross
2015-07-27 14:09 ` Dario Faggioli
2015-07-27 14:34 ` Boris Ostrovsky
2015-07-27 14:43 ` Juergen Gross
2015-07-27 14:51 ` Boris Ostrovsky
2015-07-27 15:03 ` Juergen Gross
2015-07-27 14:47 ` Juergen Gross
2015-07-27 14:58 ` Dario Faggioli
2015-07-28 4:29 ` Juergen Gross
2015-07-28 15:11 ` Juergen Gross
2015-07-28 16:17 ` Dario Faggioli
2015-07-28 17:13 ` Dario Faggioli
2015-07-29 6:04 ` Juergen Gross
2015-07-29 7:09 ` Dario Faggioli
2015-07-29 7:44 ` Dario Faggioli
2015-07-24 16:05 ` Dario Faggioli
2015-07-28 10:05 ` Wei Liu
2015-07-28 15:17 ` Dario Faggioli
2015-07-24 20:27 ` Elena Ufimtseva
2015-07-22 14:50 ` Dario Faggioli
2015-07-22 15:32 ` Boris Ostrovsky
2015-07-22 15:49 ` Dario Faggioli
2015-07-22 18:10 ` Boris Ostrovsky
2015-07-23 7:25 ` Jan Beulich
2015-07-24 16:03 ` Boris Ostrovsky
2015-07-23 13:46 ` Dario Faggioli
2015-07-17 10:17 ` Andrew Cooper
2015-07-16 15:26 ` Wei Liu
2015-07-27 15:13 ` David Vrabel
2015-07-27 16:02 ` Dario Faggioli
2015-07-27 16:31 ` David Vrabel
2015-07-27 16:33 ` Andrew Cooper
2015-07-27 17:42 ` Dario Faggioli
2015-07-27 17:50 ` Konrad Rzeszutek Wilk
2015-07-27 23:19 ` Andrew Cooper
2015-07-28 3:52 ` Juergen Gross
2015-07-28 9:40 ` Andrew Cooper
2015-07-28 9:28 ` Dario Faggioli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1437042762.28251.18.camel@citrix.com \
--to=dario.faggioli@citrix.com \
--cc=Andrew.Cooper3@citrix.com \
--cc=JBeulich@suse.com \
--cc=Wei.Liu2@citrix.com \
--cc=boris.ostrovsky@oracle.com \
--cc=david.vrabel@citrix.com \
--cc=elena.ufimtseva@oracle.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.