From mboxrd@z Thu Jan  1 00:00:00 1970
From: Don Slutz <dslutz@verizon.com>
Subject: Re: Strange interdependace between domains
Date: Fri, 14 Feb 2014 05:26:08 -0500
Message-ID: <52FDEF40.8040709@terremark.com>
References: <1646915994.20140213165604@gmail.com>
	<1392313015.32038.112.camel@Solace>
	<295276356.20140213222507@gmail.com>
	<1392333198.32038.153.camel@Solace>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <1392333198.32038.153.camel@Solace>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Simon Martin <furryfuttock@gmail.com>, Andrew Cooper <andrew.cooper3@citrix.com>, Nate Studer <nate.studer@dornerworks.com>, Don Slutz <dslutz@verizon.com>, xen-devel@lists.xen.org
List-Id: xen-devel@lists.xenproject.org

On 02/13/14 18:13, Dario Faggioli wrote:
> On gio, 2014-02-13 at 22:25 +0000, Simon Martin wrote:
>> Thanks for all the replies guys.
>>
> :-)
>
>> Don> How many instruction per second a thread gets does depend on the
>> Don> "idleness" of other threads (no longer just the hyperThread's
>> Don> parther).
>>
>> This    seems    a    bit    strange   to   me. In my case I have time
>> critical  PV  running  by  itself  in a CPU pool. So Xen should not be
>> scheduling it, so I can't see how this Hypervisor thread would be affected.
>>
> I think Don is referring to the idleness of the other _hardware_ threads
> in the chip, rather than software threads of execution, either in Xen or
> in Dom0/DomU. I checked his original e-mail and, AFAIUI, he seems to
> confirm that the throughput you get on, say, core 3, depends on what
> it's sibling core (which really is his sibling hyperthread, again in the
> hardware sense... Gah, the terminology is just a mess! :-P). He seems to
> also add the fact that there is a similar kind of inter-dependency
> between all the hardware hyperthread, not just between siblings.
>
> Does this make sense Don?
>

Yes, but the results I am getting vary based on the disto (most likely 
the microcode version).


Linux (and I think that xen) both have a CPU scheduler that picks core 
before threads:

top - 04:06:29 up 66 days, 15:31, 11 users,  load average: 2.43, 0.72, 0.29
Tasks: 250 total,   1 running, 249 sleeping,   0 stopped,   0 zombie
Cpu0  : 99.7%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.2%hi, 0.1%si,  
0.0%st
Cpu1  :  0.0%us,  0.0%sy,  0.0%ni, 99.8%id,  0.0%wa,  0.0%hi, 0.2%si,  
0.0%st
Cpu2  : 99.9%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.1%hi, 0.0%si,  
0.0%st
Cpu3  :  1.6%us,  0.1%sy,  0.0%ni, 98.3%id,  0.0%wa,  0.0%hi, 0.0%si,  
0.0%st
Cpu4  : 99.9%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.1%hi, 0.0%si,  
0.0%st
Cpu5  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi, 0.0%si,  
0.0%st
Cpu6  :  1.4%us,  0.0%sy,  0.0%ni, 98.6%id,  0.0%wa,  0.0%hi, 0.0%si,  
0.0%st
Cpu7  : 99.9%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.1%hi, 0.0%si,  
0.0%st
Mem:  32940640k total, 18008576k used, 14932064k free,   285740k buffers
Swap: 10223612k total,     4696k used, 10218916k free, 16746224k cached


Is an example without xen involved and Fedora 17

Linux dcs-xen-50 3.8.11-100.fc17.x86_64 #1 SMP Wed May 1 19:31:26 UTC 
2013 x86_64 x86_64 x86_64 GNU/Linux

On this machine:

Just 7:
         start                     done
thr 0:  14 Feb 14 04:11:08.944566  14 Feb 14 04:13:20.874764
        +02:11.930198 ~= 131.93 and 9.10 GiI/Sec


6 & 7:
         start                     done
thr 0:  14 Feb 14 04:14:31.010426  14 Feb 14 04:18:55.404116
        +04:24.393690 ~= 264.39 and 4.54 GiI/Sec
thr 1:  14 Feb 14 04:14:31.010426  14 Feb 14 04:18:55.415561
        +04:24.405135 ~= 264.41 and 4.54 GiI/Sec


5 & 7:
         start                     done
thr 0:  14 Feb 14 04:20:28.902831  14 Feb 14 04:22:45.563511
        +02:16.660680 ~= 136.66 and 8.78 GiI/Sec
thr 1:  14 Feb 14 04:20:28.902831  14 Feb 14 04:22:46.182159
        +02:17.279328 ~= 137.28 and 8.74 GiI/Sec


1 & 3 & 5 & 7:
         start                     done
thr 0:  14 Feb 14 04:32:24.353302  14 Feb 14 04:35:16.870558
        +02:52.517256 ~= 172.52 and 6.96 GiI/Sec
thr 1:  14 Feb 14 04:32:24.353301  14 Feb 14 04:35:17.371155
        +02:53.017854 ~= 173.02 and 6.94 GiI/Sec
thr 2:  14 Feb 14 04:32:24.353302  14 Feb 14 04:35:17.225871
        +02:52.872569 ~= 172.87 and 6.94 GiI/Sec
thr 3:  14 Feb 14 04:32:24.353302  14 Feb 14 04:35:16.655362
        +02:52.302060 ~= 172.30 and 6.96 GiI/Sec


This is from:
Feb 14 04:29:21 dcs-xen-51 kernel: [   41.921367] microcode: CPU3 
updated to revision 0x28, date = 2012-04-24


On CentOS 5.10:
Linux dcs-xen-53 2.6.18-371.el5 #1 SMP Tue Oct 1 08:35:08 EDT 2013 
x86_64 x86_64 x86_64 GNU/Linux

only 7:
         start                     done
thr 0:  14 Feb 14 09:43:10.903549  14 Feb 14 09:46:04.925463
        +02:54.021914 ~= 174.02 and 6.90 GiI/Sec


6 & 7:
         start                     done
thr 0:  14 Feb 14 09:49:17.804633  14 Feb 14 09:55:02.473549
        +05:44.668916 ~= 344.67 and 3.48 GiI/Sec
thr 1:  14 Feb 14 09:49:17.804618  14 Feb 14 09:55:02.533243
        +05:44.728625 ~= 344.73 and 3.48 GiI/Sec


5 & 7:
         start                     done
thr 0:  14 Feb 14 10:01:30.566603  14 Feb 14 10:04:23.024858
        +02:52.458255 ~= 172.46 and 6.96 GiI/Sec
thr 1:  14 Feb 14 10:01:30.566603  14 Feb 14 10:04:23.069964
        +02:52.503361 ~= 172.50 and 6.96 GiI/Sec


1 & 3 & 5 & 7:
         start                     done
thr 0:  14 Feb 14 10:05:58.359646  14 Feb 14 10:08:50.984629
        +02:52.624983 ~= 172.62 and 6.95 GiI/Sec
thr 1:  14 Feb 14 10:05:58.359646  14 Feb 14 10:08:50.993064
        +02:52.633418 ~= 172.63 and 6.95 GiI/Sec
thr 2:  14 Feb 14 10:05:58.359645  14 Feb 14 10:08:50.857982
        +02:52.498337 ~= 172.50 and 6.96 GiI/Sec
thr 3:  14 Feb 14 10:05:58.359645  14 Feb 14 10:08:50.905031
        +02:52.545386 ~= 172.55 and 6.95 GiI/Sec


Feb 14 09:41:42 dcs-xen-53 kernel: microcode: CPU3 updated from revision 
0x17 to 0x29, date = 06122013


Hope this helps.
     -Don Slutz

>>>> 6.- All VCPUs are pinned:
>>>>
>> Dario> Right, although, if you use cpupools, and if I've understood what you're
>> Dario> up to, you really should not require pinning. I mean, the isolation
>> Dario> between the RT-ish domain and the rest of the world should be already in
>> Dario> place thanks to cpupools.
>>
>> This  is what I thought, however when running looking at the vcpu-list
>> I  CPU  affinity  was "all" until I starting pinning. As I wasn't sure
>> whether  that  was  "all  inside this cpu pool" or "all" I felt it was
>> safer to do it explicitly.
>>
> Actually, you are right, we could put things in a way that results more
> clear, when one observes the output! So, I confirm that, despite the
> fact that you see "all", that all is relative to the cpupool the domain
> is assigned to.
>
> I'll try to think on how to make this more evident... A note in the
> manpage and/or the various sources of documentation, is the easy (but
> still necessary, I agree) part, and I'll add this to my TODO list.
> Actually modifying the output is more tricky, as affinity and cpupools
> are orthogonal by design, and that is the right (IMHO) thing.
>
> I guess trying to tweak the printf()-s in `xl vcpu-list' would not be
> that hard... I'll have a look and see if I can come up with a proposal.
>
>> Dario> So, if you ask me, you're restricting too much things in
>> Dario> pool-0, where dom0 and the Windows VM runs. In fact, is there a
>> Dario> specific reason why you need all their vcpus to be statically
>> Dario> pinned each one to only one pcpu? If not, I'd leave them a
>> Dario> little bit more of freedom.
>>
>> I agree with you here, however when I don't pin CPU affinity is "all".
>> Is this "all in the CPU pool"? I couldn't find that info.
>>
> Again, yes: once a domain is in a cpupool, no matter what its affinity
> says, it won't ever reach a pcpu assigned to another cpupool. The
> technical reason is that each cpupool is ruled by it's own (copy of a)
> scheduler, even if you use, e.g., credit, for both/all the pools. In
> that case, what you will get are two full instances of credit,
> completely independent between each other, each one in charge only of a
> very specific subset of pcpus (as mandated by cpupools). So, different
> runqueues, different data structures, different anything.
>
>> Dario> What I'd try is:
>> Dario>  1. all dom0 and win7 vcpus free, so no pinning in pool0.
>> Dario>  2. pinning as follows:
>> Dario>      * all vcpus of win7 --> pcpus 1,2
>> Dario>      * all vcpus of dom0 --> no pinning
>> Dario>    this way, what you get is the following: win7 could suffer sometimes,
>> Dario>    if all its 3 vcpus gets busy, but that, I think is acceptable, at
>> Dario>    least up to a certain extent, is that the case?
>> Dario>    At the same time, you
>> Dario>    are making sure dom0 always has a chance to run, as pcpu#0 would be
>> Dario>    his exclusive playground, in case someone, including your pv499
>> Dario>    domain, needs its services.
>>
>> This  is  what  I  had when I started :-). Thanks for the confirmation
>> that I was doing it right. However if the hyperthreading is the issue,
>> then I will only have 2 PCPU available, and I will assign them both to
>> dom0 and win7.
>>
> Yes, with hyperthreading in mind, that is what you should do.
>
> Once we will have confirmed that hyperthreading is the issue, we'll see
> what we can do. I mean, if, in your case, it's fine to 'waste' a cpu,
> then ok, but I think we need a general solution for this... Perhaps with
> a little worse performances than just leaving one core/hyperthread
> completely idle, but at the same time more resource efficient.
>
> I wonder how tweaking the sched_smt_power_savings would deal with
> this...
>
>> Dario> Right. Are you familiar with tracing what happens inside Xen
>> Dario> with xentrace and, perhaps, xenalyze? It takes a bit of time to
>> Dario> get used to it but, once you dominate it, it is a good mean for
>> Dario> getting out really useful info!
>>
>> Dario> There is a blog post about that here:
>> Dario> http://blog.xen.org/index.php/2012/09/27/tracing-with-xentrace-and-xenalyze/
>> Dario> and it should have most of the info, or the links to where to
>> Dario> find them.
>>
>> Thanks for this. If this problem is more than the hyperthreading then
>> I will definitely use it. Also looks like it might be useful when I
>> start looking at the jitter on the singleshot timer (which should be
>> in a couple of weeks).
>>
> It will reveal to be very useful for that, I'm sure! :-)
>
> Let us know how the re-testing goes.
>
> Regards,
> Dario
>