From mboxrd@z Thu Jan 1 00:00:00 1970 From: Gordan Bobic Subject: Re: Virt overehead with HT [was: Re: Xen 4.5 development update] Date: Mon, 14 Jul 2014 19:31:16 +0100 Message-ID: <53C421F4.9070501@bobich.net> References: <20140701164347.61662A7843@laptop.dumpdata.com> <1405354372.29306.687.camel@Solace> <53C4062A.3040403@bobich.net> <1405356283.7341.5.camel@Abyss> <53C40B91.7080006@eu.citrix.com> <1405358537.7341.19.camel@Abyss> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta4.messagelabs.com ([85.158.143.247]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1X6l20-0003Xs-Qg for xen-devel@lists.xenproject.org; Mon, 14 Jul 2014 18:31:20 +0000 In-Reply-To: <1405358537.7341.19.camel@Abyss> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Dario Faggioli , George Dunlap Cc: Lars Kurth , George Dunlap , Ross Lagerwall , "stefano.stabellini@citrix.com" , "xen-devel@lists.xenproject.org" List-Id: xen-devel@lists.xenproject.org On 07/14/2014 06:22 PM, Dario Faggioli wrote: > On Mon, 2014-07-14 at 17:55 +0100, George Dunlap wrote: >> On 07/14/2014 05:44 PM, Dario Faggioli wrote: >>> On Mon, 2014-07-14 at 17:32 +0100, Gordan Bobic wrote: >>>> On 07/14/2014 05:12 PM, Dario Faggioli wrote: >>>>> Elapsed(stddev) BAREMETAL HVM >>>>> kernbench -j4 31.604 (0.0963328) 34.078 (0.168582) >>>>> kernbench -j8 26.586 (0.145705) 26.672 (0.0432435) >>>>> kernbench -j 27.358 (0.440307) 27.49 (0.364897) >>>>> >>>>> With HT disabled in BIOS (which means only 4 CPUs for both): >>>>> Elapsed(stddev) BAREMETAL HVM >>>>> kernbench -j4 57.754 (0.0642651) 56.46 (0.0578792) >>>>> kernbench -j8 31.228 (0.0775887) 31.362 (0.210998) >>>>> kernbench -j 32.316 (0.0270185) 33.084 (0.600442) >>> BTW, there's a mistake here. The three runs, in the no-HT case are as >>> follows: >>> kernbench -j2 >>> kernbench -j4 >>> kernbench -j >>> >>> I.e., half the number of VCPUs, as much as there are VCPUs and >>> unlimited, exactly as for the HT case. >> >> Ah -- that's a pretty critical piece of information. >> >> So actually, on native, HT enabled and disabled effectively produce the >> same exact thing if HT is not actually being used: 31 seconds in both >> cases. But on Xen, enabling HT when it's not being used (i.e., when in >> theory each core should have exactly one process running), performance >> goes from 31 seconds to 34 seconds -- roughly a 10% degradation. >> > Yes. 7.96% degradation, to be precise. > > I attempted an analysis in my first e-mail. Cutting and pasting it > here... What do you think? > > "I guess I can investigate a bit more about what happens with '-j4'. > What I suspect is that the scheduler may make a few non-optimal > decisions wrt HT, when there are more PCPUs than busy guest VCPUs. This > may be due to the fact that Dom0 (or another guest VCPU doing other > stuff than kernbench) may be already running on PCPUs that are on > different cores than the guest's one (i.e., the guest VCPUs that wants > to run kernbench), and that may force two guest's vCPUs to execute on > two HTs some of the time (which of course is something that does not > happen on baremetal!)." > > I just re-run the benchmark with credit2, which has no SMT knowledge, > and the first run (the one that does not use HT) ended up to be 37.54, > while the other two were pretty much the same of above (26.81 and > 27.92). > > This confirms, for me, that it's an SMT balancing issue that we're seen. > > I'll try more runs, e.g. with number of VCPUs equal less than > nr_corse/2 and see what happens. > > Again, thoughts? Have you tried it with VCPUs pinned to appropriate PCPUs?