From mboxrd@z Thu Jan 1 00:00:00 1970 From: Boris Ostrovsky Subject: Re: PV-vNUMA issue: topology is misinterpreted by the guest Date: Wed, 22 Jul 2015 11:32:03 -0400 Message-ID: <55AFB773.7010906@oracle.com> References: <1437042762.28251.18.camel@citrix.com> <55A7A7F40200007800091D60@mail.emea.novell.com> <55A78DF2.1060709@citrix.com> <20150716152513.GU12455@zion.uk.xensource.com> <55A7D17C.5060602@citrix.com> <55A7D2CC.1050708@oracle.com> <55A7F7F40200007800092152@mail.emea.novell.com> <55A7DE45.4040804@citrix.com> <55A7E2D8.3040203@oracle.com> <55A8B83802000078000924AE@mail.emea.novell.com> <1437118075.23656.25.camel@citrix.com> <55A946C6.8000002@oracle.com> <1437401354.5036.19.camel@citrix.com> <55AD08F7.7020105@oracle.com> <55AEA4DD.7080406@oracle.com> <1437572160.5036.39.camel@citrix.com> <55AF9F8F.7030200@suse.com> <55AFA16B.3070103@oracle.com> <55AFA41E.1080101@suse.com> <1437576645.5036.56.camel@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1ZHw12-0000rh-An for xen-devel@lists.xenproject.org; Wed, 22 Jul 2015 15:33:04 +0000 In-Reply-To: <1437576645.5036.56.camel@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Dario Faggioli , Juergen Gross Cc: Elena Ufimtseva , Wei Liu , Andrew Cooper , David Vrabel , Jan Beulich , "xen-devel@lists.xenproject.org" List-Id: xen-devel@lists.xenproject.org On 07/22/2015 10:50 AM, Dario Faggioli wrote: > On Wed, 2015-07-22 at 16:09 +0200, Juergen Gross wrote: >> On 07/22/2015 03:58 PM, Boris Ostrovsky wrote: >>> What if I configure a guest to follow HW topology? I.e. I pin VCPUs to >>> appropriate cores/threads? With elfnote I am stuck with disabled topology. >> Add an option to do exactly that: follow HW topology (pin vcpus, >> configure vnuma)? >> > I thought about configuring things in such a way that they match the > host topology, as Boris is suggesting, too. And in that case, I think > arranging for doing so in toolstack, if PV vNUMA is identified (as I > think Juergen is suggesting) seems a good approach. > > However, when I try to do that on my box, manually, but I don't seem to > be able to. > > Here's what I tried. Since I have this host topology: > cpu_topology : > cpu: core socket node > 0: 0 1 0 > 1: 0 1 0 > 2: 1 1 0 > 3: 1 1 0 > 4: 9 1 0 > 5: 9 1 0 > 6: 10 1 0 > 7: 10 1 0 > 8: 0 0 1 > 9: 0 0 1 > 10: 1 0 1 > 11: 1 0 1 > 12: 9 0 1 > 13: 9 0 1 > 14: 10 0 1 > 15: 10 0 1 > > I configured the guest like this: > vcpus = '4' > memory = '1024' > vnuma = [ [ "pnode=0","size=512","vcpus=0-1","vdistances=10,20" ], > [ "pnode=1","size=512","vcpus=2-3","vdistances=20,10" ] ] > cpus=["0","1","8","9"] > > This means vcpus 0 and 1, which are assigned to vnode 0, are pinned to > pcpu 0 and 1, which are siblings, per the host topology. > Similarly, vcpus 2 and 3, assigned to vnode 1, are assigned to two > siblings pcpus on pnode 1. > > This seems to be honoured: > # xl vcpu-list 4 > Name ID VCPU CPU State Time(s) Affinity (Hard / Soft) > test 4 0 0 -b- 10.9 0 / 0-7 > test 4 1 1 -b- 7.6 1 / 0-7 > test 4 2 8 -b- 0.1 8 / 8-15 > test 4 3 9 -b- 0.1 9 / 8-15 > > And yet, no joy: > # ssh root@192.168.1.101 "yes > /dev/null 2>&1 &" > # ssh root@192.168.1.101 "yes > /dev/null 2>&1 &" > # ssh root@192.168.1.101 "yes > /dev/null 2>&1 &" > # ssh root@192.168.1.101 "yes > /dev/null 2>&1 &" > # xl vcpu-list 4 > Name ID VCPU CPU State Time(s) Affinity (Hard / Soft) > test 4 0 0 r-- 16.4 0 / 0-7 > test 4 1 1 r-- 12.5 1 / 0-7 > test 4 2 8 -b- 0.2 8 / 8-15 > test 4 3 9 -b- 0.1 9 / 8-15 > > So, what am I doing wrong at "following the hw topology"? > >>> Besides, this is not necessarily a NUMA-only issue, it's a scheduling >>> one (inside the guest) as well. >> Sure. That's what Jan said regarding SUSE's xen-kernel. No toplogy info >> (or a trivial one) might be better than the wrong one... >> > Yep. Exacty. As Boris says, this is a generic scheduling issue, although > it's tru that it's only (as far as I can tell) with vNUMA that it bite > us so hard... I am not sure that it's only vNUMA. It's just that with vNUMA we can see a warning (on your system) that something goes wrong. In other cases (like scheduling, or sizing objects based on discovered cache sizes) we don't see anything in the log but system/programs are making wrong decisions. (And your results above may well be the example of that) -boris > I mean, performance are always going to be inconsistent, > but it's only in that case that you basically _loose_ some of the > vcpus! :-O > > Dario