xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* alloc_heap_pages is low efficient with more CPUs
@ 2012-10-11 15:18 tupeng212
  2012-10-11 15:41 ` Keir Fraser
  0 siblings, 1 reply; 15+ messages in thread
From: tupeng212 @ 2012-10-11 15:18 UTC (permalink / raw)
  To: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2307 bytes --]

I am confused with a problem:
I have a blade with 64 physical CPUs and 64G physical RAM, and defined only one VM with 1 CPU and 40G RAM.
For the first time I started the VM, it just took 3s, But for the second starting it took 30s. 
After studied it by printing log, I have located a place in the hypervisor where cost too much time, 
occupied 98% of the whole starting time.

xen/common/page_alloc.c 
/* Allocate 2^@order contiguous pages. */
static struct page_info *alloc_heap_pages(
    unsigned int zone_lo, unsigned int zone_hi,
    unsigned int node, unsigned int order, unsigned int memflags)
{
        if ( pg[i].u.free.need_tlbflush )
        {
            /* Add in extra CPUs that need flushing because of this page. */
            cpus_andnot(extra_cpus_mask, cpu_online_map, mask);
            tlbflush_filter(extra_cpus_mask, pg[i].tlbflush_timestamp);
            cpus_or(mask, mask, extra_cpus_mask);
        }

}

1 in the first starting, most of need_tlbflush=NULL, so cost little; in the second one, most of RAM have been used 
  thus makes need_tlbflush=true, so cost much.

2 but I repeated the same experiment in another blade which contains 16 physical CPUs and 64G physical RAM, the second
  starting cost 3s. After I traced the process between the two second startings, found that count entering into the judgement of
  pg[i].u.free.need_tlbflush is the same, but number of CPUs leads to the difference.

3 The code I pasted aims to compute a mask to determine whether it should flush CPU's TLB. I traced the values in starting period below:

cpus_andnot(extra_cpus_mask, cpu_online_map, mask);
//after, mask=0, cpu_online_map=0xFFFFFFFFFFFFFFFF, extra_cpus_mask=0xFFFFFFFFFFFFFFFF
tlbflush_filter(extra_cpus_mask, pg[i].tlbflush_timestamp);
//after, mask=0, extra_cpus_mask=0
cpus_or(mask, mask, extra_cpus_mask);
//after, mask=0, extra_cpus_mask=0

  every time it starts with mask=0, and ends with the same result mask=0, so leads to flush CPU's TLB definitely,  
  which seems meaningless in the staring process.

4 The problem is, this seemed meaningless code is very time-consuming, CPUs get more, time costs more, it takes 30s here in my blade
  with 64 CPUs which may need some solution to improve the efficiency.
  



tupeng

[-- Attachment #1.2: Type: text/html, Size: 5486 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2012-10-16  8:53 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CC9EC91B.41A16%keir.xen@gmail.com>
2012-10-13  6:46 ` alloc_heap_pages is low efficient with more CPUs tupeng212
2012-10-13  8:59   ` Keir Fraser
2012-10-15 13:27     ` tupeng212
2012-10-15 15:45       ` Keir Fraser
2012-10-16  7:51         ` Jan Beulich
2012-10-16  8:03           ` Keir Fraser
2012-10-16  8:28             ` Jan Beulich
2012-10-16  8:53               ` Keir Fraser
2012-10-11 15:18 tupeng212
2012-10-11 15:41 ` Keir Fraser
2012-10-12 12:24   ` tupeng212
2012-10-12 22:39     ` Mukesh Rathor
2012-10-13  6:03       ` Keir Fraser
2012-10-13  7:20         ` tupeng212
2012-10-13  8:59           ` Keir Fraser

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).