From mboxrd@z Thu Jan 1 00:00:00 1970 From: George Dunlap Subject: Re: [PATCH 07 of 10 v2] libxl: optimize the calculation of how many VCPUs can run on a candidate Date: Fri, 21 Dec 2012 16:00:00 +0000 Message-ID: <50D48780.70302@eu.citrix.com> References: <5dc2571ae5faef87977c.1355944043@Solace> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <5dc2571ae5faef87977c.1355944043@Solace> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Dario Faggioli Cc: Marcus Granado , Dan Magenheimer , Ian Campbell , Anil Madhavapeddy , Andrew Cooper , Juergen Gross , Ian Jackson , "xen-devel@lists.xen.org" , Jan Beulich , Daniel De Graaf , Matt Wilson List-Id: xen-devel@lists.xenproject.org On 19/12/12 19:07, Dario Faggioli wrote: > For choosing the best NUMA placement candidate, we need to figure out > how many VCPUs are runnable on each of them. That requires going through > all the VCPUs of all the domains and check their affinities. > > With this change, instead of doing the above for each candidate, we > do it once for all, populating an array while counting. This way, when > we later are evaluating candidates, all we need is summing up the right > elements of the array itself. > > This reduces the complexity of the overall algorithm, as it moves a > potentially expensive operation (for_each_vcpu_of_each_domain {}) > outside from the core placement loop, so that it is performed only > once instead of (potentially) tens or hundreds of times. > > Signed-off-by: Dario Faggioli You know this code best. :-) I've looked it over and just have one minor suggestion: > for (j = 0; j < nr_dom_vcpus; j++) { > + /* For each vcpu of each domain, increment the elements of > + * the array corresponding to the nodes where the vcpu runs */ > + libxl_bitmap_set_none(&vcpu_nodemap); > + libxl_for_each_set_bit(k, vinfo[j].cpumap) { > + int node = tinfo[k].node; I think I might rename "vcpu_nodemap" to something that suggests better how it fits with the algorithm -- for instance, "counted_nodemap" or "nodes_counted" -- something to suggest that this is how we avoid counting the same vcpu on the same node multiple times. -George