From mboxrd@z Thu Jan 1 00:00:00 1970 From: George Dunlap Subject: Re: CAP and performance problem Date: Thu, 6 Jun 2013 10:02:40 +0100 Message-ID: <51B05030.5000903@eu.citrix.com> References: <519B3832.30608@di.unipmn.it> <1370451024.18519.190.camel@Solace> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1370451024.18519.190.camel@Solace> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Dario Faggioli Cc: Massimo Canonico , xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On 05/06/13 17:50, Dario Faggioli wrote: > On mar, 2013-05-21 at 11:02 +0200, Massimo Canonico wrote: >> Hi, >> > Hi again, > >> I sent the following problem on xen-user ML without an answer. I hope >> I'll get one in this ML. >> >> My application is written in std C++ and it makes a matrix >> multiplication: so it uses only CPU and memory (no I/O, no network). >> >> I'm quite surprise that with CAP = 100% I got my results in about 600 >> seconds and with CAP = 50% I got my results in about 1800 seconds >> (around 3 times longer). >> >> For this kind of application I was expecting to get results in about >> 1200 seconds (2 times longer) for the second scenario with respect to >> the first one. >> >> Of course, the HW and SW are exactly the same for the 2 experiments. >> >> Am I wrong or the CAP mechanism is not working well? >> > Ok, I found a minute to run your code myself on my test box. It's quite > a large one, but since the VM has only 1 vcpu, that shouldn't really > make much difference. > > I configured vcpu-pinning in such a way that there should be no room for > interference of any kind, i.e., dedicating a core to the VM, and making > sure even his fellow thread is not busy (which matters in an > hyperthreaded system): > > # xl vcpu-list > Name ID VCPU CPU State Time(s) CPU > Affinity > Domain-0 0 0 7 -b- 38.7 0-7 > Domain-0 0 1 3 -b- 2.3 0-7 > Domain-0 0 2 2 -b- 3.3 0-7 > Domain-0 0 3 6 -b- 6.8 0-7 > Domain-0 0 4 4 -b- 3.2 0-7 > Domain-0 0 5 2 -b- 3.6 0-7 > Domain-0 0 6 4 -b- 2.1 0-7 > Domain-0 0 7 1 -b- 1.8 0-7 > Domain-0 0 8 0 -b- 2.2 0-7 > Domain-0 0 9 7 -b- 1.7 0-7 > Domain-0 0 10 1 -b- 1.8 0-7 > Domain-0 0 11 5 r-- 10.4 0-7 > Domain-0 0 12 1 -b- 3.5 0-7 > Domain-0 0 13 2 -b- 3.5 0-7 > Domain-0 0 14 3 -b- 2.7 0-7 > Domain-0 0 15 0 -b- 1.9 0-7 > vm1 1 0 11 -b- 677.0 11 > > The numbers I'm getting are, I think, much more consistent with the > expectations: > > * no cap: > Client served in 299.024 > Client served in 298.783 > Client served in 298.445 > * cap 50%: > Client served in 643.668 > Client served in 643.372 > Client served in 644.342 > > Which means time roughly doubles. > > I tried without pinning as well, and I'm getting pretty much the same > values. > > At this point, I'm not sure what could be going on on your side. If you > want to try producing some traces, we can help inspect them, looking for > something weird. You can find some information about how to produce and > better interpret traces in this blog post: > > http://blog.xen.org/index.php/2012/09/27/tracing-with-xentrace-and-xenalyze/ > > Perhaps you can share your VM config file and Dom0 configuration > (basically, Xen and Linux boot command lines), to check whether there is > something strange there. Also, you might have said this already (in > which case I forgot), what versions of Xen and Linux are we talking > about? > > I really am out of good ideas... George, any clue? Well for one, from the scheduler's perspective, the promise isn't that you'll get 50% of the *performance*, but 50% of the *cpu time*. I haven't been following the thread terribly closely, but I don't remember seeing any xentop or xentrace reports. The first question is, other than performance, do you have any reason to believe that the VM is not getting 50% of the cpu time? At some point while your test is running, could you execute the following command in dom0: xentrace -D -e 0x21000 -T 10 /tmp/test.trace This will take a 10-second trace of just the scheduling events, placing the result in /tmp/test.trace Then download and build xenalyze from the hg repo here: http://xenbits.xen.org/ext/xenalyze and run he following command: xenalyze -s /tmp/test.trace > /tmp/test.summary And post the results here? Thanks, -George