Re: [RFC] postcopy livemigration proposal

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dor Laor <dlaor@redhat.com>
To: "Nadav Har'El" <nyh@math.technion.ac.il>
Cc: Isaku Yamahata <yamahata@valinux.co.jp>,
	kvm@vger.kernel.org, qemu-devel@nongnu.org,
	t.hirofuchi@aist.go.jp, satoshi.itoh@aist.go.jp,
	Orit Wasserman <owasserm@redhat.com>, Avi Kivity <avi@redhat.com>
Subject: Re: [RFC] postcopy livemigration proposal
Date: Mon, 08 Aug 2011 14:47:07 +0300	[thread overview]
Message-ID: <4E3FCCBB.4060205@redhat.com> (raw)
In-Reply-To: <20110808105910.GA25964@fermat.math.technion.ac.il>

On 08/08/2011 01:59 PM, Nadav Har'El wrote:
>>> * What's is postcopy livemigration
>>> It is is yet another live migration mechanism for Qemu/KVM, which
>>> implements the migration technique known as "postcopy" or "lazy"
>>> migration. Just after the "migrate" command is invoked, the execution
>>> host of a VM is instantaneously switched to a destination host.
>
> Sounds like a cool idea.
>
>>> The benefit is, total migration time is shorter because it transfer
>>> a page only once. On the other hand precopy may repeat sending same pages
>>> again and again because they can be dirtied.
>>> The switching time from the source to the destination is several
>>> hunderds mili seconds so that it enables quick load balancing.
>>> For details, please refer to the papers.
>
> While these are the obvious benefits, the possible downside (that, as
> always, depends on the workload) is the amount of time that the guest
> workload runs more slowly than usual, waiting for pages it needs to
> continue. There are a whole spectrum between the guest pausing completely
> (which would solve all the problems of migration, but is often considered
> unacceptible) and running at full-speed. Is it acceptable that the guest
> runs at 90% speed during the migration? 50%? 10%?
> I guess we could have nothing to lose from having both options, and choosing
> the most appropriate technique for each guest!

+1

>
>> That's terrific  (nice video also)!
>> Orit and myself had the exact same idea too (now we can't patent it..).
>
> I think new implementation is not the only reason why you cannot patent
> this idea :-) Demand-paged migration has actually been discussed (and done)
> for nearly a quarter of a century (!) in the area of *process* migration.
>
> The first use I'm aware of was in CMU's Accent 1987 - see [1].
> Another paper, [2], written in 1991, discusses how process migration is done
> in UCB's Sprite operating system, and evaluates the various alternatives
> common at the time (20 years ago), including what it calls "lazy copying"
> is more-or-less the same thing as "post copy". Mosix (a project which, in some
> sense, is still alive to day) also used some sort of cross between pre-copying
> (of dirty pages) and copying on-demand of clean pages (from their backing
> store on the source machine).
>
>
> References
> [1] "Attacking the Process Migration Bottleneck"
>       http://www.nd.edu/~dthain/courses/cse598z/fall2004/papers/accent.pdf

w/o reading the internals, patents enable you to implement an existing 
idea on a new field. Anyway, there won't be no patent in this case. 
Still let's have the kvm innovation merged.

> [2]  "Transparent Process Migration: Design Alternatives and the Sprite
>       Implementation"
>       http://nd.edu/~dthain/courses/cse598z/fall2004/papers/sprite-migration.pdf
>
>> Advantages:
>>          - Virtual machines are using more and more memory resources ,
>>          for a virtual machine with very large working set doing live
>>          migration with reasonable down time is impossible today.
>
> If a guest actually constantly uses (working set) most of its allocated
> memory, it will basically be unable to do any significant amount of work
> on the destination VM until this large working set is transfered to the
> destination. So in this scenario, "post copying" doesn't give any
> significant advantages over plain-old "pause guest and send it to the
> destination". Or am I missing something?

There is one key advantage in this scheme/use case - if you have a guest 
with a very large working set, you'll need a very large downtime in 
order to migrate it with today's algorithm. With post copy (aka 
streaming/demand paging), the guest won't have any downtime but will run 
slower than expected.

There are guests today that is impractical to really live migrate them.

btw: Even today, marking pages RO also carries some performance penalty.

>
>> Disadvantageous:
>>          - During the live migration the guest will run slower than in
>>          today's live migration. We need to remember that even today
>>          guests suffer from performance penalty on the source during the
>>          COW stage (memory copy).
>
> I wonder if something like asynchronous page faults can help somewhat with
> multi-process guest workloads (and modified (PV) guest OS).

They should come in to play for some extent. Note that only newer Linux 
guest will enjoy of them.

>
>>          - Failure of the source or destination or the network will cause
>>          us to lose the running virtual machine. Those failures are very
>>          rare.
>
> How is this different from a VM running on a single machine that fails?
> Just that the small probability of failure (roughly) doubles for the
> relatively-short duration of the transfer?

Exactly my point, this is not a major disadvantage because of this low 
probability.

WARNING: multiple messages have this Message-ID (diff)

From: Dor Laor <dlaor@redhat.com>
To: Nadav Har'El <nyh@math.technion.ac.il>
Cc: kvm@vger.kernel.org, Orit Wasserman <owasserm@redhat.com>,
	t.hirofuchi@aist.go.jp, satoshi.itoh@aist.go.jp,
	qemu-devel@nongnu.org, Isaku Yamahata <yamahata@valinux.co.jp>,
	Avi Kivity <avi@redhat.com>
Subject: Re: [Qemu-devel] [RFC] postcopy livemigration proposal
Date: Mon, 08 Aug 2011 14:47:07 +0300	[thread overview]
Message-ID: <4E3FCCBB.4060205@redhat.com> (raw)
In-Reply-To: <20110808105910.GA25964@fermat.math.technion.ac.il>

On 08/08/2011 01:59 PM, Nadav Har'El wrote:
>>> * What's is postcopy livemigration
>>> It is is yet another live migration mechanism for Qemu/KVM, which
>>> implements the migration technique known as "postcopy" or "lazy"
>>> migration. Just after the "migrate" command is invoked, the execution
>>> host of a VM is instantaneously switched to a destination host.
>
> Sounds like a cool idea.
>
>>> The benefit is, total migration time is shorter because it transfer
>>> a page only once. On the other hand precopy may repeat sending same pages
>>> again and again because they can be dirtied.
>>> The switching time from the source to the destination is several
>>> hunderds mili seconds so that it enables quick load balancing.
>>> For details, please refer to the papers.
>
> While these are the obvious benefits, the possible downside (that, as
> always, depends on the workload) is the amount of time that the guest
> workload runs more slowly than usual, waiting for pages it needs to
> continue. There are a whole spectrum between the guest pausing completely
> (which would solve all the problems of migration, but is often considered
> unacceptible) and running at full-speed. Is it acceptable that the guest
> runs at 90% speed during the migration? 50%? 10%?
> I guess we could have nothing to lose from having both options, and choosing
> the most appropriate technique for each guest!

+1

>
>> That's terrific  (nice video also)!
>> Orit and myself had the exact same idea too (now we can't patent it..).
>
> I think new implementation is not the only reason why you cannot patent
> this idea :-) Demand-paged migration has actually been discussed (and done)
> for nearly a quarter of a century (!) in the area of *process* migration.
>
> The first use I'm aware of was in CMU's Accent 1987 - see [1].
> Another paper, [2], written in 1991, discusses how process migration is done
> in UCB's Sprite operating system, and evaluates the various alternatives
> common at the time (20 years ago), including what it calls "lazy copying"
> is more-or-less the same thing as "post copy". Mosix (a project which, in some
> sense, is still alive to day) also used some sort of cross between pre-copying
> (of dirty pages) and copying on-demand of clean pages (from their backing
> store on the source machine).
>
>
> References
> [1] "Attacking the Process Migration Bottleneck"
>       http://www.nd.edu/~dthain/courses/cse598z/fall2004/papers/accent.pdf

w/o reading the internals, patents enable you to implement an existing 
idea on a new field. Anyway, there won't be no patent in this case. 
Still let's have the kvm innovation merged.

> [2]  "Transparent Process Migration: Design Alternatives and the Sprite
>       Implementation"
>       http://nd.edu/~dthain/courses/cse598z/fall2004/papers/sprite-migration.pdf
>
>> Advantages:
>>          - Virtual machines are using more and more memory resources ,
>>          for a virtual machine with very large working set doing live
>>          migration with reasonable down time is impossible today.
>
> If a guest actually constantly uses (working set) most of its allocated
> memory, it will basically be unable to do any significant amount of work
> on the destination VM until this large working set is transfered to the
> destination. So in this scenario, "post copying" doesn't give any
> significant advantages over plain-old "pause guest and send it to the
> destination". Or am I missing something?

There is one key advantage in this scheme/use case - if you have a guest 
with a very large working set, you'll need a very large downtime in 
order to migrate it with today's algorithm. With post copy (aka 
streaming/demand paging), the guest won't have any downtime but will run 
slower than expected.

There are guests today that is impractical to really live migrate them.

btw: Even today, marking pages RO also carries some performance penalty.

>
>> Disadvantageous:
>>          - During the live migration the guest will run slower than in
>>          today's live migration. We need to remember that even today
>>          guests suffer from performance penalty on the source during the
>>          COW stage (memory copy).
>
> I wonder if something like asynchronous page faults can help somewhat with
> multi-process guest workloads (and modified (PV) guest OS).

They should come in to play for some extent. Note that only newer Linux 
guest will enjoy of them.

>
>>          - Failure of the source or destination or the network will cause
>>          us to lose the running virtual machine. Those failures are very
>>          rare.
>
> How is this different from a VM running on a single machine that fails?
> Just that the small probability of failure (roughly) doubles for the
> relatively-short duration of the transfer?

Exactly my point, this is not a major disadvantage because of this low 
probability.

next prev parent reply	other threads:[~2011-08-08 11:47 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-08  3:24 [RFC] postcopy livemigration proposal Isaku Yamahata
2011-08-08  3:24 ` [Qemu-devel] " Isaku Yamahata
2011-08-08  9:20 ` Dor Laor
2011-08-08  9:20   ` [Qemu-devel] " Dor Laor
2011-08-08  9:40   ` Yaniv Kaul
2011-08-08  9:40     ` [Qemu-devel] " Yaniv Kaul
2011-08-08 21:42     ` Anthony Liguori
2011-08-08 21:42       ` Anthony Liguori
2011-08-08 10:59   ` Nadav Har'El
2011-08-08 10:59     ` [Qemu-devel] " Nadav Har'El
2011-08-08 11:47     ` Dor Laor [this message]
2011-08-08 11:47       ` Dor Laor
2011-08-08 16:52       ` Cleber Rosa
2011-08-08 15:52         ` Anthony Liguori
2011-08-08 12:32   ` Anthony Liguori
2011-08-08 12:32     ` [Qemu-devel] " Anthony Liguori
2011-08-08 15:11     ` Dor Laor
2011-08-08 15:11       ` Dor Laor
2011-08-08 15:29       ` Anthony Liguori
2011-08-08 15:29         ` Anthony Liguori
2011-08-08 15:36         ` Avi Kivity
2011-08-08 15:36           ` [Qemu-devel] " Avi Kivity
2011-08-08 15:59           ` Anthony Liguori
2011-08-08 15:59             ` Anthony Liguori
2011-08-08 19:47             ` Dor Laor
2011-08-08 19:47               ` [Qemu-devel] " Dor Laor
2011-08-09  2:07               ` Isaku Yamahata
2011-08-09  2:07                 ` Isaku Yamahata
2011-08-08  9:38 ` Stefan Hajnoczi
2011-08-08  9:38   ` Stefan Hajnoczi
2011-08-08  9:43   ` Isaku Yamahata
2011-08-08  9:43     ` Isaku Yamahata
2011-08-08 12:38 ` Avi Kivity
2011-08-08 12:38   ` [Qemu-devel] " Avi Kivity
2011-08-09  2:33   ` Isaku Yamahata
2011-08-09  2:33     ` [Qemu-devel] " Isaku Yamahata
2011-08-10 13:55     ` Avi Kivity
2011-08-10 13:55       ` [Qemu-devel] " Avi Kivity
2011-08-11  2:19       ` Isaku Yamahata
2011-08-11  2:19         ` [Qemu-devel] " Isaku Yamahata
2011-08-11 16:55         ` Andrea Arcangeli
2011-08-11 16:55           ` [Qemu-devel] " Andrea Arcangeli
2011-08-12 11:07 ` [PATCH][RFC] post copy chardevice (was Re: [RFC] postcopy livemigration proposal) Isaku Yamahata
2011-08-12 11:07   ` [Qemu-devel] " Isaku Yamahata
2011-08-12 11:09   ` Isaku Yamahata
2011-08-12 11:09     ` [Qemu-devel] " Isaku Yamahata
2011-08-12 21:26   ` Blue Swirl
2011-08-12 21:26     ` Blue Swirl
2011-08-15 19:29   ` Avi Kivity
2011-08-15 19:29     ` [Qemu-devel] " Avi Kivity
2011-08-16  1:42     ` Isaku Yamahata
2011-08-16  1:42       ` [Qemu-devel] " Isaku Yamahata
2011-08-16 13:40       ` Avi Kivity
2011-08-16 13:40         ` [Qemu-devel] " Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E3FCCBB.4060205@redhat.com \
    --to=dlaor@redhat.com \
    --cc=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=nyh@math.technion.ac.il \
    --cc=owasserm@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=satoshi.itoh@aist.go.jp \
    --cc=t.hirofuchi@aist.go.jp \
    --cc=yamahata@valinux.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.