From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: Re: PV-vNUMA issue: topology is misinterpreted by the guest Date: Mon, 27 Jul 2015 19:42:05 +0200 Message-ID: <1438018925.5036.242.camel@citrix.com> References: <1437042762.28251.18.camel@citrix.com> <55B64A8A.7040200@citrix.com> <1438012950.5036.215.camel@citrix.com> <55B65CD6.7000607@citrix.com> <55B65D77.1050202@citrix.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============4420642738626183665==" Return-path: Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1ZJmQ1-000868-6w for xen-devel@lists.xenproject.org; Mon, 27 Jul 2015 17:42:29 +0000 In-Reply-To: <55B65D77.1050202@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Andrew Cooper Cc: Elena Ufimtseva , Wei Liu , David Vrabel , Jan Beulich , "xen-devel@lists.xenproject.org" , Boris Ostrovsky List-Id: xen-devel@lists.xenproject.org --===============4420642738626183665== Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="=-iGvVL6oxB5b+4xuOn8qf" --=-iGvVL6oxB5b+4xuOn8qf Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, 2015-07-27 at 17:33 +0100, Andrew Cooper wrote: > On 27/07/15 17:31, David Vrabel wrote: > > > >> Yeah, indeed. That's the downside of Juergen's "Linux scheduler > >> approach". But the issue is there, even without taking vNUMA into > >> account, and I think something like that would really help (only for > >> Dom0, and Linux guests, of course). > > I disagree. Whether we're using vNUMA or not, Xen should still ensure > > that the guest kernel and userspace see a consistent and correct > > topology using the native mechanisms. >=20 > +1 >=20 +1 from me as well. In fact, a mechanism for making exactly such thing happen, was what I was after when starting the thread. Then it came up that CPUID needs to be used for at least two different and potentially conflicting purposes, that we want to support both and that, whether and for whatever reason it's used, Linux configures its scheduler after it, potentially resulting in rather pathological setups. It's at that point that some decoupling started to appear interesting... :-P Also, are we really being consistent? If my methodology is correct (which might not be, please, double check, and sorry for that), I'm seeing quite some inconsistency around: HOST: root@Zhaman:~# xl info -n ... cpu_topology : cpu: core socket node 0: 0 1 0 1: 0 1 0 2: 1 1 0 3: 1 1 0 4: 9 1 0 5: 9 1 0 6: 10 1 0 7: 10 1 0 8: 0 0 1 9: 0 0 1 10: 1 0 1 11: 1 0 1 12: 9 0 1 13: 9 0 1 14: 10 0 1 15: 10 0 1 ... root@Zhaman:~# xl vcpu-list test Name ID VCPU CPU State Time(s) Affinit= y (Hard / Soft) test 2 0 0 r-- 1.5 0 / all test 2 1 1 r-- 0.2 1 / all test 2 2 8 -b- 2.2 8 / all test 2 3 9 -b- 2.0 9 / all GUEST (HVM, 4 vcpus): root@test:~# cpuid|grep CORE_ID (APIC synth): PKG_ID=3D0 CORE_ID=3D16 SMT_ID=3D0 (APIC synth): PKG_ID=3D0 CORE_ID=3D16 SMT_ID=3D1 (APIC synth): PKG_ID=3D0 CORE_ID=3D0 SMT_ID=3D0 (APIC synth): PKG_ID=3D0 CORE_ID=3D0 SMT_ID=3D1 HOST: root@Zhaman:~# xl vcpu-pin 2 all 0 root@Zhaman:~# xl vcpu-list 2 Name ID VCPU CPU State Time(s) Affinit= y (Hard / Soft) test 2 0 0 -b- 43.7 0 / all test 2 1 0 -b- 38.4 0 / all test 2 2 0 -b- 36.9 0 / all test 2 3 0 -b- 38.8 0 / all GUEST: root@test:~# cpuid|grep CORE_ID (APIC synth): PKG_ID=3D0 CORE_ID=3D16 SMT_ID=3D0 (APIC synth): PKG_ID=3D0 CORE_ID=3D16 SMT_ID=3D0 (APIC synth): PKG_ID=3D0 CORE_ID=3D16 SMT_ID=3D0 (APIC synth): PKG_ID=3D0 CORE_ID=3D16 SMT_ID=3D0 HOST: root@Zhaman:~# xl vcpu-pin 2 0 7 root@Zhaman:~# xl vcpu-pin 2 1 7 root@Zhaman:~# xl vcpu-pin 2 2 15 root@Zhaman:~# xl vcpu-pin 2 3 15 root@Zhaman:~# xl vcpu-list 2 Name ID VCPU CPU State Time(s) Affinit= y (Hard / Soft) test 2 0 7 -b- 44.3 7 / all test 2 1 7 -b- 38.9 7 / all test 2 2 15 -b- 37.3 15 / al= l test 2 3 15 -b- 39.2 15 / al= l GUEST: root@test:~# cpuid|grep CORE_ID (APIC synth): PKG_ID=3D0 CORE_ID=3D26 SMT_ID=3D1 (APIC synth): PKG_ID=3D0 CORE_ID=3D26 SMT_ID=3D1 (APIC synth): PKG_ID=3D0 CORE_ID=3D10 SMT_ID=3D1 (APIC synth): PKG_ID=3D0 CORE_ID=3D10 SMT_ID=3D1 So, it looks to me that: 1) any application using CPUID for either licensing or placement/performance optimization will get (potentially) random=20 results; 2) whatever set of values the kernel used, during guest boot, to build up its internal scheduling data structures, has no guarantee of being related to any value returned by CPUID, at a later point. Hence, I think I'm seeing inconsistency between kernel and userspace (and between userspace and itself, over time) already... Am I overlooking something?=20 (I'll provide the same, for a PV guest, tomorrow.) Regards, Dario --=20 <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) --=-iGvVL6oxB5b+4xuOn8qf Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEABECAAYFAlW2bW4ACgkQk4XaBE3IOsT3fgCePV8qYlAtrb2nQHRDfT9zx+gc hjEAoKlmYgV1uq4NOUB2/+Y0bGS+r4mI =h1eF -----END PGP SIGNATURE----- --=-iGvVL6oxB5b+4xuOn8qf-- --===============4420642738626183665== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============4420642738626183665==--