From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anthony Liguori Subject: Re: [Qemu-devel] [RFC] postcopy livemigration proposal Date: Mon, 08 Aug 2011 16:42:33 -0500 Message-ID: <4E405849.4060800@codemonkey.ws> References: <20110808032438.GC24764@valinux.co.jp> <4E3FAA53.4030602@redhat.com> <4E3FAF12.5050504@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: dlaor@redhat.com, kvm@vger.kernel.org, Orit Wasserman , t.hirofuchi@aist.go.jp, satoshi.itoh@aist.go.jp, qemu-devel@nongnu.org, Isaku Yamahata , Avi Kivity To: Yaniv Kaul Return-path: Received: from mail-pz0-f42.google.com ([209.85.210.42]:50170 "EHLO mail-pz0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751080Ab1HHVmi (ORCPT ); Mon, 8 Aug 2011 17:42:38 -0400 Received: by pzk37 with SMTP id 37so8934661pzk.1 for ; Mon, 08 Aug 2011 14:42:38 -0700 (PDT) In-Reply-To: <4E3FAF12.5050504@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On 08/08/2011 04:40 AM, Yaniv Kaul wrote: > On 08/08/2011 12:20, Dor Laor wrote: >> On 08/08/2011 06:24 AM, Isaku Yamahata wrote: >>> Design/Implementation >>> ===================== >>> The basic idea of postcopy livemigration is to use a sort of distributed >>> shared memory between the migration source and destination. >>> >>> The migration procedure looks like >>> - start migration >>> stop the guest VM on the source and send the machine states except >>> guest RAM to the destination >>> - resume the guest VM on the destination without guest RAM contents >>> - Hook guest access to pages, and pull page contents from the source >>> This continues until all the pages are pulled to the destination >>> >>> The big picture is depicted at >>> http://wiki.qemu.org/File:Postcopy-livemigration.png >> >> That's terrific (nice video also)! >> Orit and myself had the exact same idea too (now we can't patent it..). >> >> Advantages: >> - No down time due to memory copying. >> - Efficient, reduce needed traffic no need to re-send pages. >> - Reduce overall RAM consumption of the source and destination >> as opposed from current live migration (both the source and the >> destination allocate the memory until the live migration >> completes). We can free copied memory once the destination guest >> received it and save RAM. >> - Increase parallelism for SMP guests we can have multiple >> virtual CPU handle their demand paging . Less time to hold a >> global lock, less thread contention. >> - Virtual machines are using more and more memory resources , >> for a virtual machine with very large working set doing live >> migration with reasonable down time is impossible today. >> >> Disadvantageous: >> - During the live migration the guest will run slower than in >> today's live migration. We need to remember that even today >> guests suffer from performance penalty on the source during the >> COW stage (memory copy). >> - Failure of the source or destination or the network will cause >> us to lose the running virtual machine. Those failures are very >> rare. > > I highly doubt that's acceptable in enterprise deployments. I don't think you can make blanket statements about enterprise deployments. A lot of enterprises are increasingly building fault tolerance into their applications expecting that the underlying hardware will fail. With cloud environments like EC2 that experience failure on a pretty regular basis, this is just becoming all the more common. So I really don't view this as a critical issue. It certainly would be if it were the only mechanism available but as long as we can also support pre-copy migration it would be fine. Regards, Anthony Liguori