I am confused with a problem:

I have a blade with 64 physical CPUs and 64G physical RAM, and defined only one VM with 1 CPU and 40G RAM.

For the first time I started the VM, it just took 3s, But for the second starting it took 30s.

After studied it by printing log, I have located a place in the hypervisor where cost too much time,

occupied 98% of the whole starting time.

xen/common/page_alloc.c

/* Allocate 2^@order contiguous pages. */

static struct page_info *alloc_heap_pages(

unsigned int zone_lo, unsigned int zone_hi,

unsigned int node, unsigned int order, unsigned int memflags)

{

if ( pg[i].u.free.need_tlbflush )

{

/* Add in extra CPUs that need flushing because of this page. */

cpus_andnot(extra_cpus_mask, cpu_online_map, mask);

tlbflush_filter(extra_cpus_mask, pg[i].tlbflush_timestamp);

cpus_or(mask, mask, extra_cpus_mask);

}

1 in the first starting, most of need_tlbflush=NULL, so cost little; in the second one, most of RAM have been used

thus makes need_tlbflush=true, so cost much.

2 but I repeated the same experiment in another blade which contains 16 physical CPUs and 64G physical RAM, the second

starting cost 3s. After I traced the process between the two second startings, found that count entering into the judgement of

pg[i].u.free.need_tlbflush is the same, but number of CPUs leads to the difference.

3 The code I pasted aims to compute a mask to determine whether it should flush CPU's TLB. I traced the values in starting period below:

cpus_andnot(extra_cpus_mask, cpu_online_map, mask);

//after, mask=0, cpu_online_map=0xFFFFFFFFFFFFFFFF, extra_cpus_mask=0xFFFFFFFFFFFFFFFF

tlbflush_filter(extra_cpus_mask, pg[i].tlbflush_timestamp);

//after, mask=0, extra_cpus_mask=0

cpus_or(mask, mask, extra_cpus_mask);

//after, mask=0, extra_cpus_mask=0

every time it starts with mask=0, and ends with the same result mask=0, so leads to flush CPU's TLB definitely,

which seems meaningless in the staring process.

4 The problem is, this seemed meaningless code is very time-consuming, CPUs get more, time costs more, it takes 30s here in my blade

with 64 CPUs which may need some solution to improve the efficiency.