From mboxrd@z Thu Jan  1 00:00:00 1970
From: Andre Przywara <andre.przywara@amd.com>
Subject: Re: [PATCH 1 of 3 v5/leftover] libxl: enable automatic
 placement of guests on NUMA nodes
Date: Fri, 20 Jul 2012 10:26:04 +0200
Message-ID: <5009161C.2060005@amd.com>
References: <patchbomb.1342458791@Solace>
	<5fa66c8b9093399e5bc3.1342458792@Solace>
	<5007FBCE.6000201@amd.com> <1342707771.19530.235.camel@Solace>
	<1342772429.19530.247.camel@Solace>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <1342772429.19530.247.camel@Solace>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Dario Faggioli <raistlin@linux.it>
Cc: Ian Campbell <Ian.Campbell@citrix.com>, Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>, George Dunlap <george.dunlap@eu.citrix.com>, Andrew Cooper <andrew.cooper3@citrix.com>, Juergen Gross <juergen.gross@ts.fujitsu.com>, Ian Jackson <Ian.Jackson@eu.citrix.com>, xen-devel <xen-devel@lists.xen.org>
List-Id: xen-devel@lists.xenproject.org

On 07/20/2012 10:20 AM, Dario Faggioli wrote:
> On Thu, 2012-07-19 at 16:22 +0200, Dario Faggioli wrote:
>> Interesting. That's really the kind of testing we need in order to
>> fine-tune the details. Thanks for doing this.
>>
>>> Then I started 32 guests, each 4 vCPUs and 1 GB of RAM.
>>> Now since the code prefers free memory so much over free CPUs, the
>>> placement was the following:
>>> node0: guests 2,5,8,11,14,17,20,25,30
>>> node1: guests 21,27
>>> node2: none
>>> node3: none
>>> node4: guests 1,4,7,10,13,16,19,23,29
>>> node5: guests 24,31
>>> node6: guests 3,6,9,12,15,18,22,28
>>> node7: guests 26,32
>>>
>>> As you can see, the nodes with more memory are _way_ overloaded, while
>>> the lower memory ones are underutilized. In fact the first 20 guests
>>> didn't use the other nodes at all.
>>> I don't care so much about the two memory-less nodes, but I'd like to
>>> know how you came to the magic "3" in the formula:
>>>
>>>> +
>>>> +    return sign(3*freememkb_diff + nrdomains_diff);
>>>> +}
>>>
>>
>> That all being said, this is the first time the patchset had the chance
>> to run on such a big system, so I'm definitely open to suggestion on how
>> to make that formula better in reflecting what we think it's The Right
>> Thing!
>>
> Thinking more about this, I realize that I was implicitly assuming some
> symmetry in the amount of memory each nodes comes with, which is
> probably something I shouldn't have done...
>
> I really am not sure what to do here, perhaps treating the two metrics
> more evenly? Or maybe even reverse the logic and give nr_domains more
> weight?

I replaced the 3 with 1 already, that didn't change so much. I think we 
should kind of reverse the importance of node load, since starving for 
CPU time is much worse than bad memory latency. I will do some 
experiments...

> I was also thinking whether it could be worthwhile to consider the total
> number of vcpus on a node instead than the number of domain, but again,
> that's not guaranteed to be any more meaningful (suppose there are a lot
> of idle vcpus)...

Right, that was my thinking on the ride to work also ;-)
What about this: 1P and 2P guests really use their vCPUs, but for bigger 
guests we assume only a fractional usage?

Andre

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448-3567-12