qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Yaniv Kaul <ykaul@redhat.com>
To: dlaor@redhat.com
Cc: kvm@vger.kernel.org, Orit Wasserman <owasserm@redhat.com>,
	t.hirofuchi@aist.go.jp, satoshi.itoh@aist.go.jp,
	qemu-devel@nongnu.org, Isaku Yamahata <yamahata@valinux.co.jp>,
	Avi Kivity <avi@redhat.com>
Subject: Re: [Qemu-devel] [RFC] postcopy livemigration proposal
Date: Mon, 08 Aug 2011 12:40:34 +0300	[thread overview]
Message-ID: <4E3FAF12.5050504@redhat.com> (raw)
In-Reply-To: <4E3FAA53.4030602@redhat.com>

On 08/08/2011 12:20, Dor Laor wrote:
> On 08/08/2011 06:24 AM, Isaku Yamahata wrote:
>> This mail is on "Yabusame: Postcopy Live Migration for Qemu/KVM"
>> on which we'll give a talk at KVM-forum.
>> The purpose of this mail is to letting developers know it in advance
>> so that we can get better feedback on its design/implementation approach
>> early before our starting to implement it.
>>
>>
>> Background
>> ==========
>> * What's is postcopy livemigration
>> It is is yet another live migration mechanism for Qemu/KVM, which
>> implements the migration technique known as "postcopy" or "lazy"
>> migration. Just after the "migrate" command is invoked, the execution
>> host of a VM is instantaneously switched to a destination host.
>>
>> The benefit is, total migration time is shorter because it transfer
>> a page only once. On the other hand precopy may repeat sending same 
>> pages
>> again and again because they can be dirtied.
>> The switching time from the source to the destination is several
>> hunderds mili seconds so that it enables quick load balancing.
>> For details, please refer to the papers.
>>
>> We believe this is useful for others so that we'd like to merge this
>> feature into the upstream qemu/kvm. The existing implementation that
>> we have right now is very ad-hoc because it's for academic research.
>> For the upstream merge, we're starting to re-design/implement it and
>> we'd like to get feedback early.  Although many 
>> improvements/optimizations
>> are possible, we should implement/merge the simple/clean, but extensible
>> as well, one at first and then improve/optimize it later.
>>
>> postcopy livemigration will be introduced as optional feature. The 
>> existing
>> precopy livemigration remains as default behavior.
>>
>>
>> * related links:
>> project page
>> http://sites.google.com/site/grivonhome/quick-kvm-migration
>>
>> Enabling Instantaneous Relocation of Virtual Machines with a
>> Lightweight VMM Extension,
>> (proof-of-concept, ad-hoc prototype. not a new design)
>> http://grivon.googlecode.com/svn/pub/docs/ccgrid2010-hirofuchi-paper.pdf
>> http://grivon.googlecode.com/svn/pub/docs/ccgrid2010-hirofuchi-talk.pdf
>>
>> Reactive consolidation of virtual machines enabled by postcopy live 
>> migration
>> (advantage for VM consolidation)
>> http://portal.acm.org/citation.cfm?id=1996125
>> http://www.emn.fr/x-info/ascola/lib/exe/fetch.php?media=internet:vtdc-postcopy.pdf 
>>
>>
>> Qemu wiki
>> http://wiki.qemu.org/Features/PostCopyLiveMigration
>>
>>
>> Design/Implementation
>> =====================
>> The basic idea of postcopy livemigration is to use a sort of distributed
>> shared memory between the migration source and destination.
>>
>> The migration procedure looks like
>>    - start migration
>>      stop the guest VM on the source and send the machine states except
>>      guest RAM to the destination
>>    - resume the guest VM on the destination without guest RAM contents
>>    - Hook guest access to pages, and pull page contents from the source
>>      This continues until all the pages are pulled to the destination
>>
>>    The big picture is depicted at
>>    http://wiki.qemu.org/File:Postcopy-livemigration.png
>
> That's terrific  (nice video also)!
> Orit and myself had the exact same idea too (now we can't patent it..).
>
> Advantages:
>         - No down time due to memory copying.
>         - Efficient, reduce needed traffic no need to re-send pages.
>         - Reduce overall RAM consumption of the source and destination
>         as opposed from current live migration (both the source and the
>         destination allocate the memory until the live migration
>         completes). We can free copied memory once the destination guest
>         received it and save RAM.
>         - Increase parallelism for SMP guests we can have multiple
>         virtual CPU handle their demand paging . Less time to hold a
>         global lock, less thread contention.
>         - Virtual machines are using more and more memory resources ,
>         for a virtual machine with very large working set doing live
>         migration with reasonable down time is impossible today.
>
> Disadvantageous:
>         - During the live migration the guest will run slower than in
>         today's live migration. We need to remember that even today
>         guests suffer from performance penalty on the source during the
>         COW stage (memory copy).
>         - Failure of the source or destination or the network will cause
>         us to lose the running virtual machine. Those failures are very
>         rare.

I highly doubt that's acceptable in enterprise deployments.

>         In case there is shared storage we can store a copy of the
>         memory there , that can be recovered in case of such failure .
>
> Overall, it looks like a better approach for the vast majority of cases.
> Hope it will get merged to kvm and become the default way.
>
>>
>>
>> There are several design points.
>>    - who takes care of pulling page contents.
>>      an independent daemon vs a thread in qemu
>>      The daemon approach is preferable because an independent daemon 
>> would
>>      easy for debug postcopy memory mechanism without qemu.
>>      If required, it wouldn't be difficult to convert a daemon into
>>      a thread in qemu

How about async. page faults mechanism?
Y.

>>
>>    - connection between the source and the destination
>>      The connection for live migration can be re-used after sending 
>> machine
>>      state.
>>
>>    - transfer protocol
>>      The existing protocol that exists today can be extended.
>>
>>    - hooking guest RAM access
>>      Introduce a character device to handle page fault.
>>      When page fault occurs, it queues page request up to user space 
>> daemon
>>      at the destination. And the daemon pulls page contents from the 
>> source
>>      and serves it into the character device. Then the page fault is 
>> resovlved.
>
> Isn't there a simpler way of using madvise verb to mark that the 
> destination guest RAM will need paging?
>
> Cheers and looking forward to the presentation over the kvm forum,
> Dor
>
>>
>>
>> * More on hooking guest RAM access
>> There are several candidate for the implementation. Our preference is
>> character device approach.
>>
>>    - inserting hooks into everywhere in qemu/kvm
>>      This is impractical
>>
>>    - backing store for guest ram
>>      a block device or a file can be used to back guest RAM.
>>      Thus hook the guest ram access.
>>
>>      pros
>>      - new device driver isn't needed.
>>      cons
>>      - future improvement would be difficult
>>      - some KVM host feature(KSM, THP) wouldn't work
>>
>>    - character device
>>      qemu mmap() the dedicated character device, and then hook page 
>> fault.
>>
>>      pros
>>      - straght forward approach
>>      - future improvement would be easy
>>      cons
>>      - new driver is needed
>>      - some KVM host feature(KSM, THP) wouldn't work
>>        They checks if a given VMA is anonymous. This can be fixed.
>>
>>    - swap device
>>      When creating guest, it is set up as if all the guest RAM is 
>> swapped out
>>      to a dedicated swap device, which may be nbd disk (or some kind 
>> of user
>>      space block device, BUSE?).
>>      When the VM tries to access memory, swap-in is triggered and IO 
>> to the
>>      swap device is issued. Then the IO to swap is routed to the daemon
>>      in user space with nbd protocol (or BUSE, AOE, iSCSI...). The 
>> daemon pulls
>>      pages from the migration source and services the IO request.
>>
>>      pros
>>      - After the page transfer is complete, everything is same as 
>> normal case.
>>      - no new device driver isn't needed
>>      cons
>>      - future improvement would be difficult
>>      - administration: setting up nbd, swap device
>>
>> Thanks in advance
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2011-08-08  9:40 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-08  3:24 [Qemu-devel] [RFC] postcopy livemigration proposal Isaku Yamahata
2011-08-08  9:20 ` Dor Laor
2011-08-08  9:40   ` Yaniv Kaul [this message]
2011-08-08 21:42     ` Anthony Liguori
2011-08-08 10:59   ` Nadav Har'El
2011-08-08 11:47     ` Dor Laor
2011-08-08 16:52       ` Cleber Rosa
2011-08-08 15:52         ` Anthony Liguori
2011-08-08 12:32   ` Anthony Liguori
2011-08-08 15:11     ` Dor Laor
2011-08-08 15:29       ` Anthony Liguori
2011-08-08 15:36         ` Avi Kivity
2011-08-08 15:59           ` Anthony Liguori
2011-08-08 19:47             ` Dor Laor
2011-08-09  2:07               ` Isaku Yamahata
2011-08-08  9:38 ` Stefan Hajnoczi
2011-08-08  9:43   ` Isaku Yamahata
2011-08-08 12:38 ` Avi Kivity
2011-08-09  2:33   ` Isaku Yamahata
2011-08-10 13:55     ` Avi Kivity
2011-08-11  2:19       ` Isaku Yamahata
2011-08-11 16:55         ` Andrea Arcangeli
2011-08-12 11:07 ` [Qemu-devel] [PATCH][RFC] post copy chardevice (was Re: [RFC] postcopy livemigration proposal) Isaku Yamahata
2011-08-12 11:09   ` Isaku Yamahata
2011-08-12 21:26   ` Blue Swirl
2011-08-15 19:29   ` Avi Kivity
2011-08-16  1:42     ` Isaku Yamahata
2011-08-16 13:40       ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E3FAF12.5050504@redhat.com \
    --to=ykaul@redhat.com \
    --cc=avi@redhat.com \
    --cc=dlaor@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=owasserm@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=satoshi.itoh@aist.go.jp \
    --cc=t.hirofuchi@aist.go.jp \
    --cc=yamahata@valinux.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).