From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeremy Fitzhardinge Subject: Re: slow live magration / xc_restore on xen4 pvops Date: Wed, 02 Jun 2010 23:53:42 -0700 Message-ID: <4C075176.5010603@goop.org> References: <20100603010418.GB2028@kremvax.cs.ubc.ca> <20100603064545.GB52378@zanzibar.kublai.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20100603064545.GB52378@zanzibar.kublai.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: keir.fraser@eu.citrix.com, andreas.olsowski@uni.leuphana.de, xen-devel@lists.xensource.com, Ian.Jackson@eu.citrix.com, edwin.zhai@intel.com List-Id: xen-devel@lists.xenproject.org On 06/02/2010 11:45 PM, Brendan Cully wrote: > On Thursday, 03 June 2010 at 06:47, Keir Fraser wrote: > >> On 03/06/2010 02:04, "Brendan Cully" wrote: >> >> >>> I've done a bit of profiling of the restore code and observed the >>> slowness here too. It looks to me like it's probably related to >>> superpage changes. The big hit appears to be at the front of the >>> restore process during calls to allocate_mfn_list, under the >>> normal_page case. It looks like we're calling >>> xc_domain_memory_populate_physmap once per page here, instead of >>> batching the allocation? I haven't had time to investigate further >>> today, but I think this is the culprit. >>> >> Ccing Edwin Zhai. He wrote the superpage logic for domain restore. >> > Here's some data on the slowdown going from 2.6.18 to pvops dom0: > > I wrapped the call to allocate_mfn_list in uncanonicalize_pagetable > to measure the time to do the allocation. > > kernel, min call time, max call time > 2.6.18, 4 us, 72 us > pvops, 202 us, 10696 us (!) > > It looks like pvops is dramatically slower to perform the > xc_domain_memory_populate_physmap call! > That appears to be implemented as a raw hypercall, so the kernel has very little to do with it. The only thing I can see there that might be relevent is that the mlock hypercalls could be slow for some reason? J