From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anthony Liguori Subject: Re: [RFC] postcopy livemigration proposal Date: Mon, 08 Aug 2011 07:32:52 -0500 Message-ID: <4E3FD774.7010502@codemonkey.ws> References: <20110808032438.GC24764@valinux.co.jp> <4E3FAA53.4030602@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org, Orit Wasserman , t.hirofuchi@aist.go.jp, satoshi.itoh@aist.go.jp, qemu-devel@nongnu.org, Isaku Yamahata , Avi Kivity To: dlaor@redhat.com Return-path: In-Reply-To: <4E3FAA53.4030602@redhat.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org Sender: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org List-Id: kvm.vger.kernel.org On 08/08/2011 04:20 AM, Dor Laor wrote: > On 08/08/2011 06:24 AM, Isaku Yamahata wrote: >> This mail is on "Yabusame: Postcopy Live Migration for Qemu/KVM" >> on which we'll give a talk at KVM-forum. >> The purpose of this mail is to letting developers know it in advance >> so that we can get better feedback on its design/implementation approach >> early before our starting to implement it. >> >> >> Background >> ========== >> * What's is postcopy livemigration >> It is is yet another live migration mechanism for Qemu/KVM, which >> implements the migration technique known as "postcopy" or "lazy" >> migration. Just after the "migrate" command is invoked, the execution >> host of a VM is instantaneously switched to a destination host. >> >> The benefit is, total migration time is shorter because it transfer >> a page only once. On the other hand precopy may repeat sending same pages >> again and again because they can be dirtied. >> The switching time from the source to the destination is several >> hunderds mili seconds so that it enables quick load balancing. >> For details, please refer to the papers. >> >> We believe this is useful for others so that we'd like to merge this >> feature into the upstream qemu/kvm. The existing implementation that >> we have right now is very ad-hoc because it's for academic research. >> For the upstream merge, we're starting to re-design/implement it and >> we'd like to get feedback early. Although many improvements/optimizations >> are possible, we should implement/merge the simple/clean, but extensible >> as well, one at first and then improve/optimize it later. >> >> postcopy livemigration will be introduced as optional feature. The >> existing >> precopy livemigration remains as default behavior. >> >> >> * related links: >> project page >> http://sites.google.com/site/grivonhome/quick-kvm-migration >> >> Enabling Instantaneous Relocation of Virtual Machines with a >> Lightweight VMM Extension, >> (proof-of-concept, ad-hoc prototype. not a new design) >> http://grivon.googlecode.com/svn/pub/docs/ccgrid2010-hirofuchi-paper.pdf >> http://grivon.googlecode.com/svn/pub/docs/ccgrid2010-hirofuchi-talk.pdf >> >> Reactive consolidation of virtual machines enabled by postcopy live >> migration >> (advantage for VM consolidation) >> http://portal.acm.org/citation.cfm?id=1996125 >> http://www.emn.fr/x-info/ascola/lib/exe/fetch.php?media=internet:vtdc-postcopy.pdf >> >> >> Qemu wiki >> http://wiki.qemu.org/Features/PostCopyLiveMigration >> >> >> Design/Implementation >> ===================== >> The basic idea of postcopy livemigration is to use a sort of distributed >> shared memory between the migration source and destination. >> >> The migration procedure looks like >> - start migration >> stop the guest VM on the source and send the machine states except >> guest RAM to the destination >> - resume the guest VM on the destination without guest RAM contents >> - Hook guest access to pages, and pull page contents from the source >> This continues until all the pages are pulled to the destination >> >> The big picture is depicted at >> http://wiki.qemu.org/File:Postcopy-livemigration.png > > That's terrific (nice video also)! > Orit and myself had the exact same idea too (now we can't patent it..). > > Advantages: > - No down time due to memory copying. But non-deterministic down time due to network latency while trying to satisfy a page fault. > - Efficient, reduce needed traffic no need to re-send pages. It's not quite that simple. Post-copy needs to introduce a protocol capable of requesting pages. I think in presenting something like this, it's important to collect quite a bit of performance data. I'd suggest doing runs while running jitterd in the guest to attempt to quantify the actual downtime experienced too. http://git.codemonkey.ws/cgit/jitterd.git/ There's a lot of potential in something like this, but it's not obvious to me whether it's a net win. Should make for a very interesting presentation :-) > - Reduce overall RAM consumption of the source and destination > as opposed from current live migration (both the source and the > destination allocate the memory until the live migration > completes). We can free copied memory once the destination guest > received it and save RAM. > - Increase parallelism for SMP guests we can have multiple > virtual CPU handle their demand paging . Less time to hold a > global lock, less thread contention. > - Virtual machines are using more and more memory resources , > for a virtual machine with very large working set doing live > migration with reasonable down time is impossible today. This is really just a limitation of our implementation. In theory, pre-copy allows you to exert fine grain resource control over the guest which you can use to encourage convergence. > Disadvantageous: > - During the live migration the guest will run slower than in > today's live migration. We need to remember that even today > guests suffer from performance penalty on the source during the > COW stage (memory copy). > - Failure of the source or destination or the network will cause > us to lose the running virtual machine. Those failures are very > rare. > In case there is shared storage we can store a copy of the > memory there , that can be recovered in case of such failure . > > Overall, it looks like a better approach for the vast majority of cases. > Hope it will get merged to kvm and become the default way. One thing I think we need to do is put together a live migration roadmap. We've got a lot of invasive efforts underway with live migration and I fear that without some planning and serialization, some of this useful work with get lost. Regards, Anthony Liguori >> >> >> There are several design points. >> - who takes care of pulling page contents. >> an independent daemon vs a thread in qemu >> The daemon approach is preferable because an independent daemon would >> easy for debug postcopy memory mechanism without qemu. >> If required, it wouldn't be difficult to convert a daemon into >> a thread in qemu >> >> - connection between the source and the destination >> The connection for live migration can be re-used after sending machine >> state. >> >> - transfer protocol >> The existing protocol that exists today can be extended. >> >> - hooking guest RAM access >> Introduce a character device to handle page fault. >> When page fault occurs, it queues page request up to user space daemon >> at the destination. And the daemon pulls page contents from the source >> and serves it into the character device. Then the page fault is >> resovlved. > > Isn't there a simpler way of using madvise verb to mark that the > destination guest RAM will need paging? > > Cheers and looking forward to the presentation over the kvm forum, > Dor > >> >> >> * More on hooking guest RAM access >> There are several candidate for the implementation. Our preference is >> character device approach. >> >> - inserting hooks into everywhere in qemu/kvm >> This is impractical >> >> - backing store for guest ram >> a block device or a file can be used to back guest RAM. >> Thus hook the guest ram access. >> >> pros >> - new device driver isn't needed. >> cons >> - future improvement would be difficult >> - some KVM host feature(KSM, THP) wouldn't work >> >> - character device >> qemu mmap() the dedicated character device, and then hook page fault. >> >> pros >> - straght forward approach >> - future improvement would be easy >> cons >> - new driver is needed >> - some KVM host feature(KSM, THP) wouldn't work >> They checks if a given VMA is anonymous. This can be fixed. >> >> - swap device >> When creating guest, it is set up as if all the guest RAM is swapped out >> to a dedicated swap device, which may be nbd disk (or some kind of user >> space block device, BUSE?). >> When the VM tries to access memory, swap-in is triggered and IO to the >> swap device is issued. Then the IO to swap is routed to the daemon >> in user space with nbd protocol (or BUSE, AOE, iSCSI...). The daemon >> pulls >> pages from the migration source and services the IO request. >> >> pros >> - After the page transfer is complete, everything is same as normal case. >> - no new device driver isn't needed >> cons >> - future improvement would be difficult >> - administration: setting up nbd, swap device >> >> Thanks in advance > > From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:47022) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QqP0u-0003eT-Lg for qemu-devel@nongnu.org; Mon, 08 Aug 2011 08:33:02 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QqP0s-0003tf-2q for qemu-devel@nongnu.org; Mon, 08 Aug 2011 08:33:00 -0400 Received: from mail-pz0-f42.google.com ([209.85.210.42]:58836) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QqP0r-0003sp-Qx for qemu-devel@nongnu.org; Mon, 08 Aug 2011 08:32:58 -0400 Received: by pzk37 with SMTP id 37so2908862pzk.29 for ; Mon, 08 Aug 2011 05:32:56 -0700 (PDT) Message-ID: <4E3FD774.7010502@codemonkey.ws> Date: Mon, 08 Aug 2011 07:32:52 -0500 From: Anthony Liguori MIME-Version: 1.0 References: <20110808032438.GC24764@valinux.co.jp> <4E3FAA53.4030602@redhat.com> In-Reply-To: <4E3FAA53.4030602@redhat.com> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC] postcopy livemigration proposal List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: dlaor@redhat.com Cc: kvm@vger.kernel.org, Orit Wasserman , t.hirofuchi@aist.go.jp, satoshi.itoh@aist.go.jp, qemu-devel@nongnu.org, Isaku Yamahata , Avi Kivity On 08/08/2011 04:20 AM, Dor Laor wrote: > On 08/08/2011 06:24 AM, Isaku Yamahata wrote: >> This mail is on "Yabusame: Postcopy Live Migration for Qemu/KVM" >> on which we'll give a talk at KVM-forum. >> The purpose of this mail is to letting developers know it in advance >> so that we can get better feedback on its design/implementation approach >> early before our starting to implement it. >> >> >> Background >> ========== >> * What's is postcopy livemigration >> It is is yet another live migration mechanism for Qemu/KVM, which >> implements the migration technique known as "postcopy" or "lazy" >> migration. Just after the "migrate" command is invoked, the execution >> host of a VM is instantaneously switched to a destination host. >> >> The benefit is, total migration time is shorter because it transfer >> a page only once. On the other hand precopy may repeat sending same pages >> again and again because they can be dirtied. >> The switching time from the source to the destination is several >> hunderds mili seconds so that it enables quick load balancing. >> For details, please refer to the papers. >> >> We believe this is useful for others so that we'd like to merge this >> feature into the upstream qemu/kvm. The existing implementation that >> we have right now is very ad-hoc because it's for academic research. >> For the upstream merge, we're starting to re-design/implement it and >> we'd like to get feedback early. Although many improvements/optimizations >> are possible, we should implement/merge the simple/clean, but extensible >> as well, one at first and then improve/optimize it later. >> >> postcopy livemigration will be introduced as optional feature. The >> existing >> precopy livemigration remains as default behavior. >> >> >> * related links: >> project page >> http://sites.google.com/site/grivonhome/quick-kvm-migration >> >> Enabling Instantaneous Relocation of Virtual Machines with a >> Lightweight VMM Extension, >> (proof-of-concept, ad-hoc prototype. not a new design) >> http://grivon.googlecode.com/svn/pub/docs/ccgrid2010-hirofuchi-paper.pdf >> http://grivon.googlecode.com/svn/pub/docs/ccgrid2010-hirofuchi-talk.pdf >> >> Reactive consolidation of virtual machines enabled by postcopy live >> migration >> (advantage for VM consolidation) >> http://portal.acm.org/citation.cfm?id=1996125 >> http://www.emn.fr/x-info/ascola/lib/exe/fetch.php?media=internet:vtdc-postcopy.pdf >> >> >> Qemu wiki >> http://wiki.qemu.org/Features/PostCopyLiveMigration >> >> >> Design/Implementation >> ===================== >> The basic idea of postcopy livemigration is to use a sort of distributed >> shared memory between the migration source and destination. >> >> The migration procedure looks like >> - start migration >> stop the guest VM on the source and send the machine states except >> guest RAM to the destination >> - resume the guest VM on the destination without guest RAM contents >> - Hook guest access to pages, and pull page contents from the source >> This continues until all the pages are pulled to the destination >> >> The big picture is depicted at >> http://wiki.qemu.org/File:Postcopy-livemigration.png > > That's terrific (nice video also)! > Orit and myself had the exact same idea too (now we can't patent it..). > > Advantages: > - No down time due to memory copying. But non-deterministic down time due to network latency while trying to satisfy a page fault. > - Efficient, reduce needed traffic no need to re-send pages. It's not quite that simple. Post-copy needs to introduce a protocol capable of requesting pages. I think in presenting something like this, it's important to collect quite a bit of performance data. I'd suggest doing runs while running jitterd in the guest to attempt to quantify the actual downtime experienced too. http://git.codemonkey.ws/cgit/jitterd.git/ There's a lot of potential in something like this, but it's not obvious to me whether it's a net win. Should make for a very interesting presentation :-) > - Reduce overall RAM consumption of the source and destination > as opposed from current live migration (both the source and the > destination allocate the memory until the live migration > completes). We can free copied memory once the destination guest > received it and save RAM. > - Increase parallelism for SMP guests we can have multiple > virtual CPU handle their demand paging . Less time to hold a > global lock, less thread contention. > - Virtual machines are using more and more memory resources , > for a virtual machine with very large working set doing live > migration with reasonable down time is impossible today. This is really just a limitation of our implementation. In theory, pre-copy allows you to exert fine grain resource control over the guest which you can use to encourage convergence. > Disadvantageous: > - During the live migration the guest will run slower than in > today's live migration. We need to remember that even today > guests suffer from performance penalty on the source during the > COW stage (memory copy). > - Failure of the source or destination or the network will cause > us to lose the running virtual machine. Those failures are very > rare. > In case there is shared storage we can store a copy of the > memory there , that can be recovered in case of such failure . > > Overall, it looks like a better approach for the vast majority of cases. > Hope it will get merged to kvm and become the default way. One thing I think we need to do is put together a live migration roadmap. We've got a lot of invasive efforts underway with live migration and I fear that without some planning and serialization, some of this useful work with get lost. Regards, Anthony Liguori >> >> >> There are several design points. >> - who takes care of pulling page contents. >> an independent daemon vs a thread in qemu >> The daemon approach is preferable because an independent daemon would >> easy for debug postcopy memory mechanism without qemu. >> If required, it wouldn't be difficult to convert a daemon into >> a thread in qemu >> >> - connection between the source and the destination >> The connection for live migration can be re-used after sending machine >> state. >> >> - transfer protocol >> The existing protocol that exists today can be extended. >> >> - hooking guest RAM access >> Introduce a character device to handle page fault. >> When page fault occurs, it queues page request up to user space daemon >> at the destination. And the daemon pulls page contents from the source >> and serves it into the character device. Then the page fault is >> resovlved. > > Isn't there a simpler way of using madvise verb to mark that the > destination guest RAM will need paging? > > Cheers and looking forward to the presentation over the kvm forum, > Dor > >> >> >> * More on hooking guest RAM access >> There are several candidate for the implementation. Our preference is >> character device approach. >> >> - inserting hooks into everywhere in qemu/kvm >> This is impractical >> >> - backing store for guest ram >> a block device or a file can be used to back guest RAM. >> Thus hook the guest ram access. >> >> pros >> - new device driver isn't needed. >> cons >> - future improvement would be difficult >> - some KVM host feature(KSM, THP) wouldn't work >> >> - character device >> qemu mmap() the dedicated character device, and then hook page fault. >> >> pros >> - straght forward approach >> - future improvement would be easy >> cons >> - new driver is needed >> - some KVM host feature(KSM, THP) wouldn't work >> They checks if a given VMA is anonymous. This can be fixed. >> >> - swap device >> When creating guest, it is set up as if all the guest RAM is swapped out >> to a dedicated swap device, which may be nbd disk (or some kind of user >> space block device, BUSE?). >> When the VM tries to access memory, swap-in is triggered and IO to the >> swap device is issued. Then the IO to swap is routed to the daemon >> in user space with nbd protocol (or BUSE, AOE, iSCSI...). The daemon >> pulls >> pages from the migration source and services the IO request. >> >> pros >> - After the page transfer is complete, everything is same as normal case. >> - no new device driver isn't needed >> cons >> - future improvement would be difficult >> - administration: setting up nbd, swap device >> >> Thanks in advance > >