From mboxrd@z Thu Jan 1 00:00:00 1970 From: Gordan Bobic Subject: Re: Virt overehead with HT [was: Re: Xen 4.5 development update] Date: Tue, 15 Jul 2014 01:10:09 +0100 Message-ID: <53C47161.1060008@bobich.net> References: <20140701164347.61662A7843@laptop.dumpdata.com> <1405354372.29306.687.camel@Solace> <53C4062A.3040403@bobich.net> <1405356283.7341.5.camel@Abyss> <53C40B91.7080006@eu.citrix.com> <1405358537.7341.19.camel@Abyss> <53C421F4.9070501@bobich.net> <1405377850.5333.17.camel@Solace> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta4.messagelabs.com ([85.158.143.247]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1X6qJx-0008N2-Cg for xen-devel@lists.xenproject.org; Tue, 15 Jul 2014 00:10:13 +0000 In-Reply-To: <1405377850.5333.17.camel@Solace> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Dario Faggioli Cc: Lars Kurth , George Dunlap , George Dunlap , Ross Lagerwall , "stefano.stabellini@citrix.com" , "xen-devel@lists.xenproject.org" List-Id: xen-devel@lists.xenproject.org On 07/14/2014 11:44 PM, Dario Faggioli wrote: > On lun, 2014-07-14 at 19:31 +0100, Gordan Bobic wrote: >> On 07/14/2014 06:22 PM, Dario Faggioli wrote: > >>> I'll try more runs, e.g. with number of VCPUs equal less than >>> nr_corse/2 and see what happens. >>> >>> Again, thoughts? >> >> Have you tried it with VCPUs pinned to appropriate PCPUs? >> > Define "appropriate". > > I have a run for which I pinned VCPU#1-->PCPU#1, VCPU#2-->PCPU#2, and so > on, and the result is even worse: > > Average Half load -j 4 Run (std deviation): > Elapsed Time 37.808 (0.538999) > Average Optimal load -j 8 Run (std deviation): > Elapsed Time 26.594 (0.235223) > Average Maximal load -j Run (std deviation): > Elapsed Time 27.9 (0.131149) > > This is actually something I expected, since you do not allow the VCPUs > to move away from an HT with a busy sibling, even when it could have. > > In fact, you may expect better result from pinning only if you were to > pin not only the VCPUs to the PCPUs, but also the kernbench's build jobs > on the appropriate (V)CPUs in the guest.. but that's something not only > really unpractical, but also very few representative as a benchmark, I > think. > > If you pin VCPU#1 to PCPU#1 and VCPU#2 to PCPU#2, with PCPU#1 and PCPU#2 > being HT siblings, what prevents Linux (in the guest) to run two of the > four build jobs on VCPU#1 and VCPU#2 (i.e., on siblings PCPUs!!) for all > the length of the benchmark? Nothing, I think. That would imply that Xen can somehow make a better decision that the domU's kernel scheduler, something that doesn't seem that likely. I would expect not pinning CPUs to increase process migration because Xen might migrate the CPU even though the kernel in domU decided which presented CPU was most lightly loaded. > And in fact, pinning would also result in good (near to native, > perhaps?) performance, if we were exposing the SMT topology details to > guests as, in that case, Linux would do the balancing properly. However, > that's not the case either. :-( I see, so you are referring specifically to the HT case. I can see how that could cause a problem. Does pinning improve the performance with HT disabled? Gordan