Re: [Qemu-devel] [RFC] postcopy livemigration proposal

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Nadav Har'El" <nyh@math.technion.ac.il>
To: Dor Laor <dlaor@redhat.com>
Cc: kvm@vger.kernel.org, Orit Wasserman <owasserm@redhat.com>,
	t.hirofuchi@aist.go.jp, satoshi.itoh@aist.go.jp,
	qemu-devel@nongnu.org, Isaku Yamahata <yamahata@valinux.co.jp>,
	Avi Kivity <avi@redhat.com>
Subject: Re: [Qemu-devel] [RFC] postcopy livemigration proposal
Date: Mon, 8 Aug 2011 13:59:10 +0300	[thread overview]
Message-ID: <20110808105910.GA25964@fermat.math.technion.ac.il> (raw)
In-Reply-To: <4E3FAA53.4030602@redhat.com>

> >* What's is postcopy livemigration
> >It is is yet another live migration mechanism for Qemu/KVM, which
> >implements the migration technique known as "postcopy" or "lazy"
> >migration. Just after the "migrate" command is invoked, the execution
> >host of a VM is instantaneously switched to a destination host.

Sounds like a cool idea.

> >The benefit is, total migration time is shorter because it transfer
> >a page only once. On the other hand precopy may repeat sending same pages
> >again and again because they can be dirtied.
> >The switching time from the source to the destination is several
> >hunderds mili seconds so that it enables quick load balancing.
> >For details, please refer to the papers.

While these are the obvious benefits, the possible downside (that, as
always, depends on the workload) is the amount of time that the guest
workload runs more slowly than usual, waiting for pages it needs to
continue. There are a whole spectrum between the guest pausing completely
(which would solve all the problems of migration, but is often considered
unacceptible) and running at full-speed. Is it acceptable that the guest
runs at 90% speed during the migration? 50%? 10%?
I guess we could have nothing to lose from having both options, and choosing
the most appropriate technique for each guest!

> That's terrific  (nice video also)!
> Orit and myself had the exact same idea too (now we can't patent it..).

I think new implementation is not the only reason why you cannot patent
this idea :-) Demand-paged migration has actually been discussed (and done)
for nearly a quarter of a century (!) in the area of *process* migration.

The first use I'm aware of was in CMU's Accent 1987 - see [1].
Another paper, [2], written in 1991, discusses how process migration is done
in UCB's Sprite operating system, and evaluates the various alternatives
common at the time (20 years ago), including what it calls "lazy copying"
is more-or-less the same thing as "post copy". Mosix (a project which, in some
sense, is still alive to day) also used some sort of cross between pre-copying
(of dirty pages) and copying on-demand of clean pages (from their backing
store on the source machine).

References
[1] "Attacking the Process Migration Bottleneck"
     http://www.nd.edu/~dthain/courses/cse598z/fall2004/papers/accent.pdf
[2]  "Transparent Process Migration: Design Alternatives and the Sprite
     Implementation"
     http://nd.edu/~dthain/courses/cse598z/fall2004/papers/sprite-migration.pdf

> Advantages:
>         - Virtual machines are using more and more memory resources ,
>         for a virtual machine with very large working set doing live
>         migration with reasonable down time is impossible today.

If a guest actually constantly uses (working set) most of its allocated
memory, it will basically be unable to do any significant amount of work
on the destination VM until this large working set is transfered to the
destination. So in this scenario, "post copying" doesn't give any
significant advantages over plain-old "pause guest and send it to the
destination". Or am I missing something?

> Disadvantageous:
>         - During the live migration the guest will run slower than in
>         today's live migration. We need to remember that even today
>         guests suffer from performance penalty on the source during the
>         COW stage (memory copy).

I wonder if something like asynchronous page faults can help somewhat with
multi-process guest workloads (and modified (PV) guest OS). 

>         - Failure of the source or destination or the network will cause
>         us to lose the running virtual machine. Those failures are very
>         rare.

How is this different from a VM running on a single machine that fails?
Just that the small probability of failure (roughly) doubles for the
relatively-short duration of the transfer?

-- 
Nadav Har'El                        |           Monday, Aug  8 2011, 8 Av 5771
nyh@math.technion.ac.il             |-----------------------------------------
Phone +972-523-790466, ICQ 13349191 |If glory comes after death, I'm not in a
http://nadav.harel.org.il           |hurry. (Latin proverb)

next prev parent reply	other threads:[~2011-08-08 10:59 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-08  3:24 [Qemu-devel] [RFC] postcopy livemigration proposal Isaku Yamahata
2011-08-08  9:20 ` Dor Laor
2011-08-08  9:40   ` Yaniv Kaul
2011-08-08 21:42     ` Anthony Liguori
2011-08-08 10:59   ` Nadav Har'El [this message]
2011-08-08 11:47     ` Dor Laor
2011-08-08 16:52       ` Cleber Rosa
2011-08-08 15:52         ` Anthony Liguori
2011-08-08 12:32   ` Anthony Liguori
2011-08-08 15:11     ` Dor Laor
2011-08-08 15:29       ` Anthony Liguori
2011-08-08 15:36         ` Avi Kivity
2011-08-08 15:59           ` Anthony Liguori
2011-08-08 19:47             ` Dor Laor
2011-08-09  2:07               ` Isaku Yamahata
2011-08-08  9:38 ` Stefan Hajnoczi
2011-08-08  9:43   ` Isaku Yamahata
2011-08-08 12:38 ` Avi Kivity
2011-08-09  2:33   ` Isaku Yamahata
2011-08-10 13:55     ` Avi Kivity
2011-08-11  2:19       ` Isaku Yamahata
2011-08-11 16:55         ` Andrea Arcangeli
2011-08-12 11:07 ` [Qemu-devel] [PATCH][RFC] post copy chardevice (was Re: [RFC] postcopy livemigration proposal) Isaku Yamahata
2011-08-12 11:09   ` Isaku Yamahata
2011-08-12 21:26   ` Blue Swirl
2011-08-15 19:29   ` Avi Kivity
2011-08-16  1:42     ` Isaku Yamahata
2011-08-16 13:40       ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110808105910.GA25964@fermat.math.technion.ac.il \
    --to=nyh@math.technion.ac.il \
    --cc=avi@redhat.com \
    --cc=dlaor@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=owasserm@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=satoshi.itoh@aist.go.jp \
    --cc=t.hirofuchi@aist.go.jp \
    --cc=yamahata@valinux.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).