From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:45809) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QqRt0-0006He-LX for qemu-devel@nongnu.org; Mon, 08 Aug 2011 11:37:03 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QqRsy-0005pe-CT for qemu-devel@nongnu.org; Mon, 08 Aug 2011 11:37:02 -0400 Received: from mx1.redhat.com ([209.132.183.28]:15410) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QqRsy-0005pQ-4E for qemu-devel@nongnu.org; Mon, 08 Aug 2011 11:37:00 -0400 Message-ID: <4E400292.1030005@redhat.com> Date: Mon, 08 Aug 2011 18:36:50 +0300 From: Avi Kivity MIME-Version: 1.0 References: <20110808032438.GC24764@valinux.co.jp> <4E3FAA53.4030602@redhat.com> <4E3FD774.7010502@codemonkey.ws> <4E3FFC9E.8050300@redhat.com> <4E4000BF.20708@codemonkey.ws> In-Reply-To: <4E4000BF.20708@codemonkey.ws> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC] postcopy livemigration proposal List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: kvm@vger.kernel.org, satoshi.itoh@aist.go.jp, t.hirofuchi@aist.go.jp, dlaor@redhat.com, Orit Wasserman , qemu-devel@nongnu.org, Isaku Yamahata On 08/08/2011 06:29 PM, Anthony Liguori wrote: > >>>> - Efficient, reduce needed traffic no need to re-send pages. >>> >>> It's not quite that simple. Post-copy needs to introduce a protocol >>> capable of requesting pages. >> >> Just another subsection.. (kidding), still it shouldn't be too >> complicated, just an offset+pagesize and return page_content/error > > What I meant by this is that there is potentially a lot of round trip > overhead. Pre-copy migration works well with reasonable high latency > network connections because the downtime is capped only by the maximum > latency sending from one point to another. > > But with something like this, the total downtime is > 2*max_latency*nb_pagefaults. That's potentially pretty high. Let's be generous and assume that the latency is dominated by page copy time. So the total downtime is equal to the first live migration pass, ~20 sec for 2GB on 1GbE. It's distributed over potentially even more time, though. If the guest does a lot of I/O, it may not be noticeable (esp. if we don't copy over pages read from disk). If the guest is cpu/memory bound, it'll probably suck badly. > > So it may be desirable to try to reduce nb_pagefaults by prefaulting > in pages, etc. Suffice to say, this ends up getting complicated and > may end up burning network traffic too. Yeah, and prefaulting in the background adds latency to synchronous requests. This really needs excellent networking resources to work well. -- error compiling committee.c: too many arguments to function