From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: PV-vNUMA issue: topology is misinterpreted by the guest Date: Thu, 16 Jul 2015 12:32:42 +0200 Message-ID: <1437042762.28251.18.camel@citrix.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============3511573158470749204==" Return-path: Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1ZFgTR-0007Qx-Ao for xen-devel@lists.xenproject.org; Thu, 16 Jul 2015 10:33:05 +0000 List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: "xen-devel@lists.xenproject.org" Cc: Elena Ufimtseva , Wei Liu , Andrew Cooper , David Vrabel , Jan Beulich , Boris Ostrovsky List-Id: xen-devel@lists.xenproject.org --===============3511573158470749204== Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="=-HhcQuhP2E2FEaTf8L7kZ" --=-HhcQuhP2E2FEaTf8L7kZ Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hey, This started on IRC, but it's actually appropriate to have the conversation here. I just discovered an issue with vNUMA, when PV guests are used. In fact, creating a 4 vCPUs PV guest, and making up things so that all the 4 vCPUs should be busy, I see this: root@Zhaman:~# xl vcpu-list test Name ID VCPU CPU State Time(s) Affinity= (Hard / Soft) test 4 0 5 r-- 1481.9 all / 0-= 7 test 4 1 2 r-- 1479.4 all / 0-= 7 test 4 2 15 -b- 7.5 all / 8-= 15 test 4 3 10 -b- 1324.8 all / 8-= 15 Going checking inside the guest, confirms that *everything* runs on vCPUs 0 and 1. However, using schedtool or taskset, I can force tasks to execute on vCPUs 2 and 3. Inspecting the guest's dmesg, I've seen this: [ 0.128416] ------------[ cut here ]------------ [ 0.128416] WARNING: CPU: 2 PID: 0 at ../arch/x86/kernel/smpboot.c:317 t= opology_sane.isra.2+0x74/0x88() [ 0.128416] sched: CPU #2's smt-sibling CPU #0 is not on the same node! = [node: 1 !=3D 0]. Ignoring dependency. [ 0.128416] Modules linked in: [ 0.128416] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.19.0+ #1 [ 0.128416] 0000000000000009 ffff88001ee3bdd0 ffffffff81657c7b ffffffff= 810bbd2c [ 0.128416] ffff88001ee3be20 ffff88001ee3be10 ffffffff81081510 ffff8800= 1ee3bea0 [ 0.128416] ffffffff8103aa02 ffff88003ea0a001 0000000000000000 ffff8800= 1f20a040 [ 0.128416] Call Trace: [ 0.128416] [] dump_stack+0x4f/0x7b [ 0.128416] [] ? up+0x39/0x3e [ 0.128416] [] warn_slowpath_common+0xa1/0xbb [ 0.128416] [] ? topology_sane.isra.2+0x74/0x88 [ 0.128416] [] warn_slowpath_fmt+0x46/0x48 [ 0.128416] [] ? __cpuid.constprop.0+0x15/0x19 [ 0.128416] [] topology_sane.isra.2+0x74/0x88 [ 0.128416] [] set_cpu_sibling_map+0x21a/0x444 [ 0.128416] [] ? numa_add_cpu+0x98/0x9f [ 0.128416] [] cpu_bringup+0x63/0xa8 [ 0.128416] [] cpu_bringup_and_idle+0xe/0x1a [ 0.128416] ---[ end trace 95bff1aef57ee1b1 ]--- So, basically, Linux is complaining that we're trying to put two vCPUs, that looks to be SMT siblings, on different NUMA nodes. And, yes, I think this is quite disruptive for the Linux's scheduler internal logic. The vnuma bits of the guest config are these: vnuma =3D [ [ "pnode=3D0","size=3D512","vcpus=3D0-1","vdistances=3D10,20" = ], [ "pnode=3D1","size=3D512","vcpus=3D2-3","vdistances=3D20,10" ]= ] =46rom inside the guest, the topology looks to be like this: root@test:~# numactl --hardware available: 2 nodes (0-1) node 0 cpus: 0 1 node 0 size: 475 MB node 0 free: 382 MB node 1 cpus: 2 3 node 1 size: 495 MB node 1 free: 475 MB node distances: node 0 1=20 0: 10 10=20 1: 20 10 root@test:~# cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list= =20 0-1 root@test:~# cat /sys/devices/system/cpu/cpu0/topology/core_siblings_list= =20 0-3 root@test:~# cat /sys/devices/system/cpu/cpu2/topology/thread_siblings_list= =20 2-3 root@test:~# cat /sys/devices/system/cpu/cpu2/topology/core_siblings_list= =20 0-3 So the complain during boot seems to be against 'core_siblings' (which was not what I expected, but perhaps I misremember the meaning of "core_siblings" VS. "thread_siblings" VS. smt-siblings in Linux; I'll double check). Anyway, is there anything we can do to fix or workaround things? Regards, Dario --=20 <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) --=-HhcQuhP2E2FEaTf8L7kZ Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEABECAAYFAlWniFMACgkQk4XaBE3IOsSwuQCeM87KlnIZAxLs3vqQUU30vGm6 gHAAn0+MuZqRry5C71Gf789ctqr50BkD =ufuD -----END PGP SIGNATURE----- --=-HhcQuhP2E2FEaTf8L7kZ-- --===============3511573158470749204== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============3511573158470749204==--