From mboxrd@z Thu Jan  1 00:00:00 1970
From: George Dunlap <george.dunlap@eu.citrix.com>
Subject: Re: CAP and performance problem
Date: Thu, 6 Jun 2013 10:02:40 +0100
Message-ID: <51B05030.5000903@eu.citrix.com>
References: <519B3832.30608@di.unipmn.it> <1370451024.18519.190.camel@Solace>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <1370451024.18519.190.camel@Solace>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Massimo Canonico <mex@di.unipmn.it>, xen-devel@lists.xen.org
List-Id: xen-devel@lists.xenproject.org

On 05/06/13 17:50, Dario Faggioli wrote:
> On mar, 2013-05-21 at 11:02 +0200, Massimo Canonico wrote:
>> Hi,
>>
> Hi again,
>
>> I sent the following problem on xen-user ML without an answer. I hope
>> I'll get one in this ML.
>>
>> My application is written in std C++ and it makes a matrix
>> multiplication: so it uses only CPU and memory (no I/O, no network).
>>
>> I'm quite surprise that with CAP = 100% I got my results in about 600
>> seconds and with CAP = 50% I got my results in about 1800 seconds
>> (around 3 times longer).
>>
>> For this kind of application I was expecting to get results in about
>> 1200 seconds (2 times longer) for the second scenario with respect to
>> the first one.
>>
>> Of course, the HW and SW are exactly the same for the 2 experiments.
>>
>> Am I wrong or the CAP mechanism is not working well?
>>
> Ok, I found a minute to run your code myself on my test box. It's quite
> a large one, but since the VM has only 1 vcpu, that shouldn't really
> make much difference.
>
> I configured vcpu-pinning in such a way that there should be no room for
> interference of any kind, i.e., dedicating a core to the VM, and making
> sure even his fellow thread is not busy (which matters in an
> hyperthreaded system):
>
> # xl vcpu-list
> Name                                ID  VCPU   CPU State   Time(s) CPU
> Affinity
> Domain-0                             0     0    7   -b-      38.7  0-7
> Domain-0                             0     1    3   -b-       2.3  0-7
> Domain-0                             0     2    2   -b-       3.3  0-7
> Domain-0                             0     3    6   -b-       6.8  0-7
> Domain-0                             0     4    4   -b-       3.2  0-7
> Domain-0                             0     5    2   -b-       3.6  0-7
> Domain-0                             0     6    4   -b-       2.1  0-7
> Domain-0                             0     7    1   -b-       1.8  0-7
> Domain-0                             0     8    0   -b-       2.2  0-7
> Domain-0                             0     9    7   -b-       1.7  0-7
> Domain-0                             0    10    1   -b-       1.8  0-7
> Domain-0                             0    11    5   r--      10.4  0-7
> Domain-0                             0    12    1   -b-       3.5  0-7
> Domain-0                             0    13    2   -b-       3.5  0-7
> Domain-0                             0    14    3   -b-       2.7  0-7
> Domain-0                             0    15    0   -b-       1.9  0-7
> vm1                                  1     0   11   -b-     677.0  11
>
> The numbers I'm getting are, I think, much more consistent with the
> expectations:
>
>   * no cap:
>    Client served in 299.024
>    Client served in 298.783
>    Client served in 298.445
>   * cap 50%:
>    Client served in 643.668
>    Client served in 643.372
>    Client served in 644.342
>
> Which means time roughly doubles.
>
> I tried without pinning as well, and I'm getting pretty much the same
> values.
>
> At this point, I'm not sure what could be going on on your side. If you
> want to try producing some traces, we can help inspect them, looking for
> something weird. You can find some information about how to produce and
> better interpret traces in this blog post:
>
> http://blog.xen.org/index.php/2012/09/27/tracing-with-xentrace-and-xenalyze/
>
> Perhaps you can share your VM config file and Dom0 configuration
> (basically, Xen and Linux boot command lines), to check whether there is
> something strange there. Also, you might have said this already (in
> which case I forgot), what versions of Xen and Linux are we talking
> about?
>
> I really am out of good ideas... George, any clue?

Well for one, from the scheduler's perspective, the promise isn't that 
you'll get 50% of the *performance*, but 50% of the *cpu time*.  I 
haven't been following the thread terribly closely, but I don't remember 
seeing any xentop or xentrace reports.  The first question is, other 
than performance, do you have any reason to believe that the VM is not 
getting 50% of the cpu time?

At some point while your test is running, could you execute the 
following command in dom0:

xentrace -D -e 0x21000 -T 10 /tmp/test.trace

This will take a 10-second trace of just the scheduling events, placing 
the result in /tmp/test.trace

Then download and build xenalyze from the hg repo here:

http://xenbits.xen.org/ext/xenalyze

and run he following command:

xenalyze -s /tmp/test.trace > /tmp/test.summary

And post the results here?

Thanks,
  -George