From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:50695) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QqXar-0002KG-Hm for qemu-devel@nongnu.org; Mon, 08 Aug 2011 17:42:42 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QqXaq-0004wX-5H for qemu-devel@nongnu.org; Mon, 08 Aug 2011 17:42:41 -0400 Received: from mail-pz0-f42.google.com ([209.85.210.42]:40825) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QqXap-0004wE-UP for qemu-devel@nongnu.org; Mon, 08 Aug 2011 17:42:40 -0400 Received: by pzk37 with SMTP id 37so3749409pzk.29 for ; Mon, 08 Aug 2011 14:42:38 -0700 (PDT) Message-ID: <4E405849.4060800@codemonkey.ws> Date: Mon, 08 Aug 2011 16:42:33 -0500 From: Anthony Liguori MIME-Version: 1.0 References: <20110808032438.GC24764@valinux.co.jp> <4E3FAA53.4030602@redhat.com> <4E3FAF12.5050504@redhat.com> In-Reply-To: <4E3FAF12.5050504@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC] postcopy livemigration proposal List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Yaniv Kaul Cc: kvm@vger.kernel.org, satoshi.itoh@aist.go.jp, t.hirofuchi@aist.go.jp, dlaor@redhat.com, qemu-devel@nongnu.org, Orit Wasserman , Avi Kivity , Isaku Yamahata On 08/08/2011 04:40 AM, Yaniv Kaul wrote: > On 08/08/2011 12:20, Dor Laor wrote: >> On 08/08/2011 06:24 AM, Isaku Yamahata wrote: >>> Design/Implementation >>> ===================== >>> The basic idea of postcopy livemigration is to use a sort of distributed >>> shared memory between the migration source and destination. >>> >>> The migration procedure looks like >>> - start migration >>> stop the guest VM on the source and send the machine states except >>> guest RAM to the destination >>> - resume the guest VM on the destination without guest RAM contents >>> - Hook guest access to pages, and pull page contents from the source >>> This continues until all the pages are pulled to the destination >>> >>> The big picture is depicted at >>> http://wiki.qemu.org/File:Postcopy-livemigration.png >> >> That's terrific (nice video also)! >> Orit and myself had the exact same idea too (now we can't patent it..). >> >> Advantages: >> - No down time due to memory copying. >> - Efficient, reduce needed traffic no need to re-send pages. >> - Reduce overall RAM consumption of the source and destination >> as opposed from current live migration (both the source and the >> destination allocate the memory until the live migration >> completes). We can free copied memory once the destination guest >> received it and save RAM. >> - Increase parallelism for SMP guests we can have multiple >> virtual CPU handle their demand paging . Less time to hold a >> global lock, less thread contention. >> - Virtual machines are using more and more memory resources , >> for a virtual machine with very large working set doing live >> migration with reasonable down time is impossible today. >> >> Disadvantageous: >> - During the live migration the guest will run slower than in >> today's live migration. We need to remember that even today >> guests suffer from performance penalty on the source during the >> COW stage (memory copy). >> - Failure of the source or destination or the network will cause >> us to lose the running virtual machine. Those failures are very >> rare. > > I highly doubt that's acceptable in enterprise deployments. I don't think you can make blanket statements about enterprise deployments. A lot of enterprises are increasingly building fault tolerance into their applications expecting that the underlying hardware will fail. With cloud environments like EC2 that experience failure on a pretty regular basis, this is just becoming all the more common. So I really don't view this as a critical issue. It certainly would be if it were the only mechanism available but as long as we can also support pre-copy migration it would be fine. Regards, Anthony Liguori