[Qemu-devel] [RFC]VM live snapshot proposal

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [RFC]VM live snapshot proposal
@ 2014-03-03  1:13 Huangpeng (Peter)
  2014-03-03 12:32 ` Stefan Hajnoczi
  0 siblings, 1 reply; 26+ messages in thread
From: Huangpeng (Peter) @ 2014-03-03  1:13 UTC (permalink / raw)
  To: Paolo Bonzini, kwolf@redhat.com, stefanha@gmail.com,
	qemu-devel@nongnu.org, Wenchao Xia, Pavel Hrdina,
	KVM devel mailing list
  Cc: Zhanghailiang

Hi, All

I found some discussion about VM live-snapshot, but haven't seen any progress.
https://lists.gnu.org/archive/html/qemu-devel/2013-08/msg02125.html
http://markmail.org/thread/shneezha7kmtosvb#query:+page:1+mid:shneezha7kmtosvb+state:results

Here I have another proposal, based on the live-migration scheme, add consistent 
memory state tracking and saving.
The idea is simple:
1.First round use live-migration to save all memory to a snapshot file.
2.intercept the action of memory-modify, save old pages to a temporary file and mark dirty-bits,
3.Merge temporary file to the original snapshot file

Detailed process:
(1)Pause VM
(2) Save the device status to a temporary file (live-migration already supported )
(3) Make disk snapshot
(4) Enable page dirty log and old dirty pages save function(which we need to add)
(5) Resume VM
(6) Begin the first round of iteration, we save the entire contents of the VM memory pages
to the snapshot file
(7) In the second round of iteration , we save the old page to the snapshot file
(8) Merge data of device status which is pre-saved in temporary files to the snapshot file
(8) End ram snapshot and some cleanup work

Due to memory-modifications may happen in kvm, qemu, or vhost, the key-part is how we
can provide common page-modify-tracking-and-saving api, we completed a prototype by 
simply add modified-page tracking/saving function in qemu, and it seems worked fine.

Is this program acceptable? or are there any other better suggestions?

Thanks

Peter Huang

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC]VM live snapshot proposal
  2014-03-03  1:13 [Qemu-devel] [RFC]VM live snapshot proposal Huangpeng (Peter)
@ 2014-03-03 12:32 ` Stefan Hajnoczi
  2014-03-03 12:55   ` Kevin Wolf
                     ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Stefan Hajnoczi @ 2014-03-03 12:32 UTC (permalink / raw)
  To: Huangpeng (Peter)
  Cc: kwolf@redhat.com, Pavel Hrdina, Zhanghailiang,
	KVM devel mailing list, qemu-devel@nongnu.org, Paolo Bonzini,
	Wenchao Xia

On Mon, Mar 03, 2014 at 01:13:41AM +0000, Huangpeng (Peter) wrote:

Just to summarize the idea of live savevm for people joining the
discussion:

It should be possible to save a snapshot of the guest (including memory,
devices, and disk) without noticable downtime.

The 'savevm' command pauses the guest until the snapshot has been
completed and therefore doesn't meet the requirements.

> Here I have another proposal, based on the live-migration scheme, add consistent 
> memory state tracking and saving.
> The idea is simple:
> 1.First round use live-migration to save all memory to a snapshot file.
> 2.intercept the action of memory-modify, save old pages to a temporary file and mark dirty-bits,
> 3.Merge temporary file to the original snapshot file
> 
> Detailed process:
> (1)Pause VM
> (2) Save the device status to a temporary file (live-migration already supported )
> (3) Make disk snapshot
> (4) Enable page dirty log and old dirty pages save function(which we need to add)
> (5) Resume VM
> (6) Begin the first round of iteration, we save the entire contents of the VM memory pages
> to the snapshot file
> (7) In the second round of iteration , we save the old page to the snapshot file
> (8) Merge data of device status which is pre-saved in temporary files to the snapshot file
> (8) End ram snapshot and some cleanup work
> 
> Due to memory-modifications may happen in kvm, qemu, or vhost, the key-part is how we
> can provide common page-modify-tracking-and-saving api, we completed a prototype by 
> simply add modified-page tracking/saving function in qemu, and it seems worked fine.

Yes, this is the tricky part.  To be honest, I think this is the reason
no one has submitted patches - it's a hard task and the win isn't that
great (you can already migrate to file).

But back to the options:

If the host has enough free memory to fork QEMU, a small helper process
can be used to save the copy-on-write memory snapshot (thanks to fork(2)
semantics).  The hard part about the fork(2) approach is that QEMU isn't
really designed to fork, so work is necessary to reach a quiescent state
for the child process.

If there is not enough memory to fork, then a synchronous approach to
catching guest memory writes is needed.  I'm not sure if a good
mechanism for that exists but the simplest would be mprotect(2) and a
signal handler (which will make the guest run very slowly).

Stefan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC]VM live snapshot proposal
  2014-03-03 12:32 ` Stefan Hajnoczi
@ 2014-03-03 12:55   ` Kevin Wolf
  2014-03-03 13:19     ` Paolo Bonzini
  2014-03-04  1:06     ` Huangpeng (Peter)
  2014-03-03 13:18   ` Paolo Bonzini
  2014-03-04  1:02   ` Huangpeng (Peter)
  2 siblings, 2 replies; 26+ messages in thread
From: Kevin Wolf @ 2014-03-03 12:55 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Pavel Hrdina, Zhanghailiang, KVM devel mailing list,
	Huangpeng (Peter), qemu-devel@nongnu.org, Paolo Bonzini,
	Wenchao Xia

Am 03.03.2014 um 13:32 hat Stefan Hajnoczi geschrieben:
> On Mon, Mar 03, 2014 at 01:13:41AM +0000, Huangpeng (Peter) wrote:
> 
> Just to summarize the idea of live savevm for people joining the
> discussion:
> 
> It should be possible to save a snapshot of the guest (including memory,
> devices, and disk) without noticable downtime.
> 
> The 'savevm' command pauses the guest until the snapshot has been
> completed and therefore doesn't meet the requirements.
> 
> > Here I have another proposal, based on the live-migration scheme, add consistent 
> > memory state tracking and saving.
> > The idea is simple:
> > 1.First round use live-migration to save all memory to a snapshot file.
> > 2.intercept the action of memory-modify, save old pages to a temporary file and mark dirty-bits,
> > 3.Merge temporary file to the original snapshot file

Why do you need a temporary file for this? Couldn't you directly store
the memory to its final destination in the snapshot file?

> > Detailed process:
> > (1)Pause VM
> > (2) Save the device status to a temporary file (live-migration already supported )
> > (3) Make disk snapshot
> > (4) Enable page dirty log and old dirty pages save function(which we need to add)
> > (5) Resume VM
> > (6) Begin the first round of iteration, we save the entire contents of the VM memory pages
> > to the snapshot file
> > (7) In the second round of iteration , we save the old page to the snapshot file
> > (8) Merge data of device status which is pre-saved in temporary files to the snapshot file
> > (8) End ram snapshot and some cleanup work
> > 
> > Due to memory-modifications may happen in kvm, qemu, or vhost, the key-part is how we
> > can provide common page-modify-tracking-and-saving api, we completed a prototype by 
> > simply add modified-page tracking/saving function in qemu, and it seems worked fine.
> 
> Yes, this is the tricky part.  To be honest, I think this is the reason
> no one has submitted patches - it's a hard task and the win isn't that
> great (you can already migrate to file).

So why don't we simply reuse the existing migration code?

Kevin

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC]VM live snapshot proposal
  2014-03-03 12:55   ` Kevin Wolf
@ 2014-03-03 13:19     ` Paolo Bonzini
  2014-03-03 13:30       ` Kevin Wolf
  2014-03-04  1:06     ` Huangpeng (Peter)
  1 sibling, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2014-03-03 13:19 UTC (permalink / raw)
  To: Kevin Wolf, Stefan Hajnoczi
  Cc: Pavel Hrdina, Zhanghailiang, KVM devel mailing list,
	Huangpeng (Peter), qemu-devel@nongnu.org, Wenchao Xia

Il 03/03/2014 13:55, Kevin Wolf ha scritto:
>>> > > Due to memory-modifications may happen in kvm, qemu, or vhost, the key-part is how we
>>> > > can provide common page-modify-tracking-and-saving api, we completed a prototype by
>>> > > simply add modified-page tracking/saving function in qemu, and it seems worked fine.
>> >
>> > Yes, this is the tricky part.  To be honest, I think this is the reason
>> > no one has submitted patches - it's a hard task and the win isn't that
>> > great (you can already migrate to file).
> So why don't we simply reuse the existing migration code?

I think this is different in the same way that block-backup and 
block-mirror are different.  Huangpeng's proposal would let you make a 
consistent snapshot of disks and RAM.

Paolo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC]VM live snapshot proposal
  2014-03-03 13:19     ` Paolo Bonzini
@ 2014-03-03 13:30       ` Kevin Wolf
  2014-03-03 13:47         ` Paolo Bonzini
  2014-03-04  1:28         ` Huangpeng (Peter)
  0 siblings, 2 replies; 26+ messages in thread
From: Kevin Wolf @ 2014-03-03 13:30 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Pavel Hrdina, Zhanghailiang, KVM devel mailing list,
	Stefan Hajnoczi, Huangpeng (Peter), qemu-devel@nongnu.org,
	Wenchao Xia

Am 03.03.2014 um 14:19 hat Paolo Bonzini geschrieben:
> Il 03/03/2014 13:55, Kevin Wolf ha scritto:
> >>>> > Due to memory-modifications may happen in kvm, qemu, or vhost, the key-part is how we
> >>>> > can provide common page-modify-tracking-and-saving api, we completed a prototype by
> >>>> > simply add modified-page tracking/saving function in qemu, and it seems worked fine.
> >>>
> >>> Yes, this is the tricky part.  To be honest, I think this is the reason
> >>> no one has submitted patches - it's a hard task and the win isn't that
> >>> great (you can already migrate to file).
> >So why don't we simply reuse the existing migration code?
> 
> I think this is different in the same way that block-backup and
> block-mirror are different.  Huangpeng's proposal would let you make
> a consistent snapshot of disks and RAM.

Right. Though the point isn't about consistency (doing the disk snapshot
when memory has converged would be consistent as well), but about
having the snapshot semantically right at the time when the monitor
command is issued instead of only starting it then and being consistent
at the point of completion.

This is indeed like pre/post-copy live migration, and probably both
options have their uses. I would suggest starting with the easy one, and
adding the post-copy feature on top.

Kevin

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC]VM live snapshot proposal
  2014-03-03 13:30       ` Kevin Wolf
@ 2014-03-03 13:47         ` Paolo Bonzini
  2014-03-03 14:04           ` Kevin Wolf
                             ` (2 more replies)
  2014-03-04  1:28         ` Huangpeng (Peter)
  1 sibling, 3 replies; 26+ messages in thread
From: Paolo Bonzini @ 2014-03-03 13:47 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Andrea Arcangeli, Pavel Hrdina, Zhanghailiang,
	KVM devel mailing list, Stefan Hajnoczi, Huangpeng (Peter),
	qemu-devel@nongnu.org, Wenchao Xia

Il 03/03/2014 14:30, Kevin Wolf ha scritto:
> > > So why don't we simply reuse the existing migration code?
> > I think this is different in the same way that block-backup and
> > block-mirror are different.  Huangpeng's proposal would let you make
> > a consistent snapshot of disks and RAM.
> Right. Though the point isn't about consistency (doing the disk snapshot
> when memory has converged would be consistent as well), but about
> having the snapshot semantically right at the time when the monitor
> command is issued instead of only starting it then and being consistent
> at the point of completion.

Right---though it's not entirely true that migration only affects the 
point in time where you have consistency.  For example, with migration 
you cannot use the guest agent for freeze/thaw and, even if we changed 
the code to allow that, the pause would be much longer than for live 
snapshots or block-backup.

> This is indeed like pre/post-copy live migration, and probably both
> options have their uses. I would suggest starting with the easy one, and
> adding the post-copy feature on top.

The feature matrix for migration and snapshot

                           disk       RAM        internal snapshot
non-live                  yes (0)    yes (0)    yes
live, disk only           yes (1)    N/A        yes (2)
live, pre-copy            yes (3)    yes        no
live, post-copy           yes (4)    no         no
live, point-in-time       yes (5)    no         no

     (0) just stop VM while doing normal pre-copy migration
     (1) blockdev-snapshot-sync
     (2) blockdev-snapshot-internal-sync
     (3) block-stream
     (4) drive-mirror
     (5) drive-backup

By "the easy one" you mean live savevm with snapshot at the end of RAM 
migration, I guess.  But the functionality is already available using 
migration, while point-in-time snapshots actually add new functionality. 
  I'm not sure what's the status of the kernel infrastructure for 
post-copy.  Andrea?

Paolo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC]VM live snapshot proposal
  2014-03-03 13:47         ` Paolo Bonzini
@ 2014-03-03 14:04           ` Kevin Wolf
  2014-03-03 14:55           ` Dr. David Alan Gilbert
  2014-03-03 19:52           ` Andrea Arcangeli
  2 siblings, 0 replies; 26+ messages in thread
From: Kevin Wolf @ 2014-03-03 14:04 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Andrea Arcangeli, Pavel Hrdina, Zhanghailiang,
	KVM devel mailing list, Stefan Hajnoczi, Huangpeng (Peter),
	qemu-devel@nongnu.org, Wenchao Xia

Am 03.03.2014 um 14:47 hat Paolo Bonzini geschrieben:
> Il 03/03/2014 14:30, Kevin Wolf ha scritto:
> >> > So why don't we simply reuse the existing migration code?
> >> I think this is different in the same way that block-backup and
> >> block-mirror are different.  Huangpeng's proposal would let you make
> >> a consistent snapshot of disks and RAM.
> >Right. Though the point isn't about consistency (doing the disk snapshot
> >when memory has converged would be consistent as well), but about
> >having the snapshot semantically right at the time when the monitor
> >command is issued instead of only starting it then and being consistent
> >at the point of completion.
> 
> Right---though it's not entirely true that migration only affects
> the point in time where you have consistency.  For example, with
> migration you cannot use the guest agent for freeze/thaw and, even
> if we changed the code to allow that, the pause would be much longer
> than for live snapshots or block-backup.
> 
> >This is indeed like pre/post-copy live migration, and probably both
> >options have their uses. I would suggest starting with the easy one, and
> >adding the post-copy feature on top.
> 
> The feature matrix for migration and snapshot
> 
>                           disk       RAM        internal snapshot
> non-live                  yes (0)    yes (0)    yes
> live, disk only           yes (1)    N/A        yes (2)
> live, pre-copy            yes (3)    yes        no
> live, post-copy           yes (4)    no         no
> live, point-in-time       yes (5)    no         no
> 
>     (0) just stop VM while doing normal pre-copy migration
>     (1) blockdev-snapshot-sync
>     (2) blockdev-snapshot-internal-sync
>     (3) block-stream
>     (4) drive-mirror
>     (5) drive-backup
> 
> By "the easy one" you mean live savevm with snapshot at the end of
> RAM migration, I guess.  But the functionality is already available
> using migration, while point-in-time snapshots actually add new
> functionality.  I'm not sure what's the status of the kernel
> infrastructure for post-copy.  Andrea?

Yes, it's available, but not with internal snapshots, but only with
RAM snapshots stored in an external file.

An incremental next step would be to avoid writing dirtied memory to two
places, because internal snapshots aren't a streaming, but a random
access interface, so you can overwrite the original place instead of
appending the new copy. That would already be a small advantage.

Once you have this infrastructure, it's probably also a bit easier to
plug in any post-copy/point-in-time features that the migration code can
(be improved to) provide.

Kevin

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC]VM live snapshot proposal
  2014-03-03 13:47         ` Paolo Bonzini
  2014-03-03 14:04           ` Kevin Wolf
@ 2014-03-03 14:55           ` Dr. David Alan Gilbert
  2014-03-03 19:52           ` Andrea Arcangeli
  2 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert @ 2014-03-03 14:55 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Andrea Arcangeli, Pavel Hrdina, Zhanghailiang,
	KVM devel mailing list, Stefan Hajnoczi, Huangpeng (Peter),
	mrhines, qemu-devel@nongnu.org, Wenchao Xia

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> Il 03/03/2014 14:30, Kevin Wolf ha scritto:
> >> > So why don't we simply reuse the existing migration code?
> >> I think this is different in the same way that block-backup and
> >> block-mirror are different.  Huangpeng's proposal would let you make
> >> a consistent snapshot of disks and RAM.
> >Right. Though the point isn't about consistency (doing the disk snapshot
> >when memory has converged would be consistent as well), but about
> >having the snapshot semantically right at the time when the monitor
> >command is issued instead of only starting it then and being consistent
> >at the point of completion.
> 
> Right---though it's not entirely true that migration only affects
> the point in time where you have consistency.  For example, with
> migration you cannot use the guest agent for freeze/thaw and, even
> if we changed the code to allow that, the pause would be much longer
> than for live snapshots or block-backup.
> 
> >This is indeed like pre/post-copy live migration, and probably both
> >options have their uses. I would suggest starting with the easy one, and
> >adding the post-copy feature on top.
> 
> The feature matrix for migration and snapshot
> 
>                           disk       RAM        internal snapshot
> non-live                  yes (0)    yes (0)    yes
> live, disk only           yes (1)    N/A        yes (2)
> live, pre-copy            yes (3)    yes        no
> live, post-copy           yes (4)    no         no
> live, point-in-time       yes (5)    no         no
> 
>     (0) just stop VM while doing normal pre-copy migration
>     (1) blockdev-snapshot-sync
>     (2) blockdev-snapshot-internal-sync
>     (3) block-stream
>     (4) drive-mirror
>     (5) drive-backup
> 
> By "the easy one" you mean live savevm with snapshot at the end of
> RAM migration, I guess.  But the functionality is already available
> using migration, while point-in-time snapshots actually add new
> functionality.  I'm not sure what's the status of the kernel
> infrastructure for post-copy.  Andrea?

Accumulating the running set of changes that migration is spitting out
gets you some of the way - but to do it you have to have
points in the migration stream which represent a consistent view of
device state, RAM and disk and I think the tricky point is getting
those consistent points; while the CPU is running the set of pages that
migration spits out are certainly newer than old versions of the pages
but I don't think you can just put a marker in and say that the point
represents a single consistent view of RAM.

In many ways this is the opposite of Michael Hines's microcheckpointing approach;
which stops everything and takes the snapshot regularly; I did suggest a
modification to that would be to COW those checkpoints.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC]VM live snapshot proposal
  2014-03-03 13:47         ` Paolo Bonzini
  2014-03-03 14:04           ` Kevin Wolf
  2014-03-03 14:55           ` Dr. David Alan Gilbert
@ 2014-03-03 19:52           ` Andrea Arcangeli
  2014-03-04  1:35             ` Huangpeng (Peter)
  2014-03-05  1:52             ` Huangpeng (Peter)
  2 siblings, 2 replies; 26+ messages in thread
From: Andrea Arcangeli @ 2014-03-03 19:52 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Pavel Hrdina, Zhanghailiang, KVM devel mailing list,
	Stefan Hajnoczi, Huangpeng (Peter), qemu-devel@nongnu.org,
	Wenchao Xia

Hi Paolo,

On Mon, Mar 03, 2014 at 02:47:31PM +0100, Paolo Bonzini wrote:
>   I'm not sure what's the status of the kernel infrastructure for 
> post-copy.  Andrea?

sys_userfaultfd is still work in progress but it shouldn't be much
work left to completion. madvise(MADV_USERFAULT) and
remap_anon_pages() are complete for a while.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC]VM live snapshot proposal
  2014-03-03 19:52           ` Andrea Arcangeli
@ 2014-03-04  1:35             ` Huangpeng (Peter)
  2014-03-05 14:46               ` Andrea Arcangeli
  2014-03-05  1:52             ` Huangpeng (Peter)
  1 sibling, 1 reply; 26+ messages in thread
From: Huangpeng (Peter) @ 2014-03-04  1:35 UTC (permalink / raw)
  To: Andrea Arcangeli, Paolo Bonzini
  Cc: Kevin Wolf, Pavel Hrdina, Zhanghailiang, KVM devel mailing list,
	Stefan Hajnoczi, qemu-devel@nongnu.org, Wenchao Xia

 
> Hi Paolo,
> 
> On Mon, Mar 03, 2014 at 02:47:31PM +0100, Paolo Bonzini wrote:
> >   I'm not sure what's the status of the kernel infrastructure for
> > post-copy.  Andrea?
> 
> sys_userfaultfd is still work in progress but it shouldn't be much work left to
> completion. madvise(MADV_USERFAULT) and
> remap_anon_pages() are complete for a while.

http://qemu-project.org/Features/PostCopyLiveMigration
From the feature description, post-copy uses memory copy, so this infrastructure
will solve this problem, but do not help snapshot, am I right?

Thansk


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC]VM live snapshot proposal
  2014-03-04  1:35             ` Huangpeng (Peter)
@ 2014-03-05 14:46               ` Andrea Arcangeli
  0 siblings, 0 replies; 26+ messages in thread
From: Andrea Arcangeli @ 2014-03-05 14:46 UTC (permalink / raw)
  To: Huangpeng (Peter)
  Cc: Kevin Wolf, Pavel Hrdina, Zhanghailiang, KVM devel mailing list,
	Stefan Hajnoczi, qemu-devel@nongnu.org, Paolo Bonzini,
	Wenchao Xia

Hi,

On Tue, Mar 04, 2014 at 01:35:53AM +0000, Huangpeng (Peter) wrote:
>  
> > Hi Paolo,
> > 
> > On Mon, Mar 03, 2014 at 02:47:31PM +0100, Paolo Bonzini wrote:
> > >   I'm not sure what's the status of the kernel infrastructure for
> > > post-copy.  Andrea?
> > 
> > sys_userfaultfd is still work in progress but it shouldn't be much work left to
> > completion. madvise(MADV_USERFAULT) and
> > remap_anon_pages() are complete for a while.
> 
> http://qemu-project.org/Features/PostCopyLiveMigration
> From the feature description, post-copy uses memory copy, so this infrastructure
> will solve this problem, but do not help snapshot, am I right?

Correct there's no copy with this infrastructure, other than whatever
data copy that may be happening inside the network receive protocol
for skb linearization into userland memory. With RDMA or zerocopy DMA
receive mechanisms, there may be no copy at all.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC]VM live snapshot proposal
  2014-03-03 19:52           ` Andrea Arcangeli
  2014-03-04  1:35             ` Huangpeng (Peter)
@ 2014-03-05  1:52             ` Huangpeng (Peter)
  2014-03-05 14:55               ` Andrea Arcangeli
  1 sibling, 1 reply; 26+ messages in thread
From: Huangpeng (Peter) @ 2014-03-05  1:52 UTC (permalink / raw)
  To: Andrea Arcangeli, Paolo Bonzini
  Cc: Kevin Wolf, Pavel Hrdina, Zhanghailiang, KVM devel mailing list,
	Stefan Hajnoczi, qemu-devel@nongnu.org, Wenchao Xia

Hi, Andrea

Where can I get the dev-git-branch?
I can use it to try the snapshot prototype coding.

Thanks.

> -----Original Message-----
> From: Andrea Arcangeli [mailto:aarcange@redhat.com]
> Sent: Tuesday, March 04, 2014 3:52 AM
> To: Paolo Bonzini
> Cc: Kevin Wolf; Stefan Hajnoczi; Huangpeng (Peter); qemu-devel@nongnu.org;
> Wenchao Xia; Pavel Hrdina; KVM devel mailing list; Zhanghailiang
> Subject: Re: [RFC]VM live snapshot proposal
> 
> Hi Paolo,
> 
> On Mon, Mar 03, 2014 at 02:47:31PM +0100, Paolo Bonzini wrote:
> >   I'm not sure what's the status of the kernel infrastructure for
> > post-copy.  Andrea?
> 
> sys_userfaultfd is still work in progress but it shouldn't be much work left to
> completion. madvise(MADV_USERFAULT) and
> remap_anon_pages() are complete for a while.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC]VM live snapshot proposal
  2014-03-05  1:52             ` Huangpeng (Peter)
@ 2014-03-05 14:55               ` Andrea Arcangeli
  0 siblings, 0 replies; 26+ messages in thread
From: Andrea Arcangeli @ 2014-03-05 14:55 UTC (permalink / raw)
  To: Huangpeng (Peter)
  Cc: Kevin Wolf, Pavel Hrdina, Zhanghailiang, KVM devel mailing list,
	Stefan Hajnoczi, qemu-devel@nongnu.org, Paolo Bonzini,
	Wenchao Xia

On Wed, Mar 05, 2014 at 01:52:14AM +0000, Huangpeng (Peter) wrote:
> Hi, Andrea
> 
> Where can I get the dev-git-branch?
> I can use it to try the snapshot prototype coding.

You can find the current status in the origin/master branch here
http://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git

however userlandfd is still missing so it's not yet good for
transparent userfault when it's O_DIRECT or other gup users triggering
the access (those would currently return an error to userland if they
hit on a userfault vma, and we don't want to change userland to ever
get an error or the modifications to userland are too big).

userlandfd will let the kernel wait on an event from the migration
thread and it will talk with the migration thread directly. So
userland won't be able to notice the userfault happening inside a
write() or kvm ioctl() syscall (you could notice only if you strace
the migration thread). That's more efficient too so the host scheduler
can directly switch to the migration thread without having to return
to userland first. And after remap_anon_pages completes and the host
scheduler runs the vcpu or I/O thread again, gup_fast can continue
from kernel mode where it stopped again without unnecessary exits to
userland. Making the kernel speak directly to the migration thread is
somewhat more tricky at the kernel level that what you find in aa.git
right now, but it is worth it to be transparent to all syscalls that
would trip on userfaults with gup_fast.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC]VM live snapshot proposal
  2014-03-03 13:30       ` Kevin Wolf
  2014-03-03 13:47         ` Paolo Bonzini
@ 2014-03-04  1:28         ` Huangpeng (Peter)
  2014-03-04  9:40           ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 26+ messages in thread
From: Huangpeng (Peter) @ 2014-03-04  1:28 UTC (permalink / raw)
  To: Kevin Wolf, Paolo Bonzini
  Cc: Pavel Hrdina, Zhanghailiang, KVM devel mailing list,
	Stefan Hajnoczi, qemu-devel@nongnu.org, Wenchao Xia

> > I think this is different in the same way that block-backup and
> > block-mirror are different.  Huangpeng's proposal would let you make a
> > consistent snapshot of disks and RAM.
> 
> Right. Though the point isn't about consistency (doing the disk snapshot when
> memory has converged would be consistent as well), but about having the
> snapshot semantically right at the time when the monitor command is issued
> instead of only starting it then and being consistent at the point of completion.
> 
> This is indeed like pre/post-copy live migration, and probably both options have
> their uses. I would suggest starting with the easy one, and adding the
> post-copy feature on top.
> 

Good suggestion, The latest patches of post-copy seems updated 2 years ago.
https://github.com/yamahata/qemu

One question:
Can post-copy fallback if exceptions happen during post-copy?

Thanks


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC]VM live snapshot proposal
  2014-03-04  1:28         ` Huangpeng (Peter)
@ 2014-03-04  9:40           ` Dr. David Alan Gilbert
  2014-03-05  1:00             ` Huangpeng (Peter)
  2014-03-06  1:42             ` Huangpeng (Peter)
  0 siblings, 2 replies; 26+ messages in thread
From: Dr. David Alan Gilbert @ 2014-03-04  9:40 UTC (permalink / raw)
  To: Huangpeng (Peter)
  Cc: Kevin Wolf, Pavel Hrdina, Zhanghailiang, KVM devel mailing list,
	Stefan Hajnoczi, qemu-devel@nongnu.org, Paolo Bonzini,
	Wenchao Xia

* Huangpeng (Peter) (peter.huangpeng@huawei.com) wrote:
> > > I think this is different in the same way that block-backup and
> > > block-mirror are different.  Huangpeng's proposal would let you make a
> > > consistent snapshot of disks and RAM.
> > 
> > Right. Though the point isn't about consistency (doing the disk snapshot when
> > memory has converged would be consistent as well), but about having the
> > snapshot semantically right at the time when the monitor command is issued
> > instead of only starting it then and being consistent at the point of completion.
> > 
> > This is indeed like pre/post-copy live migration, and probably both options have
> > their uses. I would suggest starting with the easy one, and adding the
> > post-copy feature on top.
> > 
> 
> Good suggestion, The latest patches of post-copy seems updated 2 years ago.
> https://github.com/yamahata/qemu

I'm working on post-copy at the moment, using Andrea's kernel code,
using bits of Yamahata's code base as well; hopefully it won't
be too long until we have something to post.

> One question:
> Can post-copy fallback if exceptions happen during post-copy?

What do you mean by 'exceptions' here?  Generally postcopy can't fall
back to precopy because once you're in postcopy mode the state
is split between the two machines.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC]VM live snapshot proposal
  2014-03-04  9:40           ` Dr. David Alan Gilbert
@ 2014-03-05  1:00             ` Huangpeng (Peter)
  2014-03-05  9:09               ` Paolo Bonzini
  2014-03-06  1:42             ` Huangpeng (Peter)
  1 sibling, 1 reply; 26+ messages in thread
From: Huangpeng (Peter) @ 2014-03-05  1:00 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Kevin Wolf, Pavel Hrdina, Zhanghailiang, KVM devel mailing list,
	Stefan Hajnoczi, qemu-devel@nongnu.org, Paolo Bonzini,
	Wenchao Xia

> > Good suggestion, The latest patches of post-copy seems updated 2 years
> ago.
> > https://github.com/yamahata/qemu
> 
> I'm working on post-copy at the moment, using Andrea's kernel code, using bits
> of Yamahata's code base as well; hopefully it won't be too long until we have
> something to post.
> 
> > One question:
> > Can post-copy fallback if exceptions happen during post-copy?
> 
> What do you mean by 'exceptions' here?  Generally postcopy can't fall back to
> precopy because once you're in postcopy mode the state is split between the
> two machines.
> 

Like destination VM interrupted due to memory-copy error or other exceptions,
with pre-copy scheme, we can fall-back to the source-vm.

One simple question(may be discussed before), what kind of scenario does post-copy
aim for?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC]VM live snapshot proposal
  2014-03-05  1:00             ` Huangpeng (Peter)
@ 2014-03-05  9:09               ` Paolo Bonzini
  0 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2014-03-05  9:09 UTC (permalink / raw)
  To: Huangpeng (Peter), Dr. David Alan Gilbert
  Cc: Kevin Wolf, Pavel Hrdina, Zhanghailiang, KVM devel mailing list,
	Stefan Hajnoczi, qemu-devel@nongnu.org, Wenchao Xia

Il 05/03/2014 02:00, Huangpeng (Peter) ha scritto:
>>> One question:
>>> Can post-copy fallback if exceptions happen during post-copy?
>>
>> What do you mean by 'exceptions' here?  Generally postcopy can't fall back to
>> precopy because once you're in postcopy mode the state is split between the
>> two machines.
>
> Like destination VM interrupted due to memory-copy error or other exceptions,
> with pre-copy scheme, we can fall-back to the source-vm.

No, postcopy cannot do that.

However, this is a limitation of postcopy, not of the kernel interfaces 
that Andrea is adding.  If you use those interfaces to implement live VM 
point-in-time snapshots, you can drop the snapshotting operation safely 
and keep the VM running.

> One simple question(may be discussed before), what kind of scenario does post-copy
> aim for?

Mostly cases where pre-copy migration doesn't converge because the guest 
is too big, or when you require a really, really small downtime.

It can be useful when you have to evacuate a host as fast as possible 
(due to detecting an intrusion or impending hardware failure), because 
the alternative is to shutdown the VM immediately.  It can also be used 
for upgrading QEMU on the host where the VM is running; in this case you 
can use a Unix socket for transport, and eliminate the chance of 
migration failing due to a network problem.

Paolo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC]VM live snapshot proposal
  2014-03-04  9:40           ` Dr. David Alan Gilbert
  2014-03-05  1:00             ` Huangpeng (Peter)
@ 2014-03-06  1:42             ` Huangpeng (Peter)
  2014-03-06  9:14               ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 26+ messages in thread
From: Huangpeng (Peter) @ 2014-03-06  1:42 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Kevin Wolf, Pavel Hrdina, Zhanghailiang, KVM devel mailing list,
	Stefan Hajnoczi, qemu-devel@nongnu.org, Paolo Bonzini,
	Wenchao Xia

Hi David

Where can I get your post-copy git tree?
I wish to take a look into it first before start live-snapshot design.

Thanks.

 
> I'm working on post-copy at the moment, using Andrea's kernel code, using bits
> of Yamahata's code base as well; hopefully it won't be too long until we have
> something to post.
> 
> > One question:
> > Can post-copy fallback if exceptions happen during post-copy?
> 
> What do you mean by 'exceptions' here?  Generally postcopy can't fall back to
> precopy because once you're in postcopy mode the state is split between the
> two machines.
> 
> Dave
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC]VM live snapshot proposal
  2014-03-06  1:42             ` Huangpeng (Peter)
@ 2014-03-06  9:14               ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert @ 2014-03-06  9:14 UTC (permalink / raw)
  To: Huangpeng (Peter)
  Cc: Kevin Wolf, Pavel Hrdina, Zhanghailiang, KVM devel mailing list,
	Stefan Hajnoczi, qemu-devel@nongnu.org, Paolo Bonzini,
	Wenchao Xia

* Huangpeng (Peter) (peter.huangpeng@huawei.com) wrote:
> Hi David
> 
> Where can I get your post-copy git tree?
> I wish to take a look into it first before start live-snapshot design.

It's not yet published, as soon as it shows signs of life and I tidy it up
I'll get it out there.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC]VM live snapshot proposal
  2014-03-03 12:55   ` Kevin Wolf
  2014-03-03 13:19     ` Paolo Bonzini
@ 2014-03-04  1:06     ` Huangpeng (Peter)
  1 sibling, 0 replies; 26+ messages in thread
From: Huangpeng (Peter) @ 2014-03-04  1:06 UTC (permalink / raw)
  To: Kevin Wolf, Stefan Hajnoczi
  Cc: Pavel Hrdina, Zhanghailiang, KVM devel mailing list,
	qemu-devel@nongnu.org, Paolo Bonzini, Wenchao Xia



> > > Here I have another proposal, based on the live-migration scheme,
> > > add consistent memory state tracking and saving.
> > > The idea is simple:
> > > 1.First round use live-migration to save all memory to a snapshot file.
> > > 2.intercept the action of memory-modify, save old pages to a
> > > temporary file and mark dirty-bits, 3.Merge temporary file to the
> > > original snapshot file
> 
> Why do you need a temporary file for this? Couldn't you directly store the
> memory to its final destination in the snapshot file?
> 

Writing to the same snapshot file needs to consider about write protection,
currently we implemented the prototype in the simplest way, and if this proposal
is accepted we will consider about it.

thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC]VM live snapshot proposal
  2014-03-03 12:32 ` Stefan Hajnoczi
  2014-03-03 12:55   ` Kevin Wolf
@ 2014-03-03 13:18   ` Paolo Bonzini
  2014-03-04  1:02   ` Huangpeng (Peter)
  2 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2014-03-03 13:18 UTC (permalink / raw)
  To: Stefan Hajnoczi, Huangpeng (Peter)
  Cc: kwolf@redhat.com, Andrea Arcangeli, Pavel Hrdina, Zhanghailiang,
	KVM devel mailing list, qemu-devel@nongnu.org,
	Dr David Alan Gilbert, Wenchao Xia

Il 03/03/2014 13:32, Stefan Hajnoczi ha scritto:
> If there is not enough memory to fork, then a synchronous approach to
> catching guest memory writes is needed.  I'm not sure if a good
> mechanism for that exists but the simplest would be mprotect(2) and a
> signal handler (which will make the guest run very slowly).

I think we'll be adding such a mechanism, but for guest memory reads, 
for postcopy migration.  Perhaps it could be reused for live snapshotting?

Paolo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC]VM live snapshot proposal
  2014-03-03 12:32 ` Stefan Hajnoczi
  2014-03-03 12:55   ` Kevin Wolf
  2014-03-03 13:18   ` Paolo Bonzini
@ 2014-03-04  1:02   ` Huangpeng (Peter)
  2014-03-04  8:54     ` Stefan Hajnoczi
  2 siblings, 1 reply; 26+ messages in thread
From: Huangpeng (Peter) @ 2014-03-04  1:02 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kwolf@redhat.com, Pavel Hrdina, Zhanghailiang,
	KVM devel mailing list, qemu-devel@nongnu.org, Paolo Bonzini,
	Wenchao Xia

 
> Yes, this is the tricky part.  To be honest, I think this is the reason no one has
> submitted patches - it's a hard task and the win isn't that great (you can
> already migrate to file).
>
Yes, lots of places have to be considered. Though scenarios are limited, users like
library experiments may need to revert repeatedly to the same vm-state(memory 
state + disk state) .

The key-part is tracking and saving the consistent state right on snapshot time, 
kvm/qemu/vhost have already implement dirty-tracking and my proposal will add 
common save-old-page apis to save the consistent state. Is this way right or do you 
have other suggestions? 

> But back to the options:
> 
> If the host has enough free memory to fork QEMU, a small helper process can
> be used to save the copy-on-write memory snapshot (thanks to fork(2)
> semantics).  The hard part about the fork(2) approach is that QEMU isn't
> really designed to fork, so work is necessary to reach a quiescent state for the
> child process.
> 
> If there is not enough memory to fork, then a synchronous approach to
> catching guest memory writes is needed.  I'm not sure if a good mechanism
> for that exists but the simplest would be mprotect(2) and a signal handler
> (which will make the guest run very slowly).
> 
> Stefan

In real production environment, memory over-commit or use as much memory as
possible may be the normal case, so the fork semantics cannot meet the needs.  

Is there any other proposals to implement vm-snapshot?

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC]VM live snapshot proposal
  2014-03-04  1:02   ` Huangpeng (Peter)
@ 2014-03-04  8:54     ` Stefan Hajnoczi
  2014-03-04  9:05       ` Paolo Bonzini
  2014-03-05  0:46       ` Huangpeng (Peter)
  0 siblings, 2 replies; 26+ messages in thread
From: Stefan Hajnoczi @ 2014-03-04  8:54 UTC (permalink / raw)
  To: Huangpeng (Peter)
  Cc: kwolf@redhat.com, Pavel Hrdina, Zhanghailiang,
	KVM devel mailing list, qemu-devel@nongnu.org, Paolo Bonzini,
	Wenchao Xia

On Tue, Mar 04, 2014 at 01:02:44AM +0000, Huangpeng (Peter) wrote:
> > But back to the options:
> > 
> > If the host has enough free memory to fork QEMU, a small helper process can
> > be used to save the copy-on-write memory snapshot (thanks to fork(2)
> > semantics).  The hard part about the fork(2) approach is that QEMU isn't
> > really designed to fork, so work is necessary to reach a quiescent state for the
> > child process.
> > 
> > If there is not enough memory to fork, then a synchronous approach to
> > catching guest memory writes is needed.  I'm not sure if a good mechanism
> > for that exists but the simplest would be mprotect(2) and a signal handler
> > (which will make the guest run very slowly).
> > 
> > Stefan
> 
> In real production environment, memory over-commit or use as much memory as
> possible may be the normal case, so the fork semantics cannot meet the needs.  

Yes, I think you're right.  The fork approach only works in the easy
case where there is plenty of free host memory.

> Is there any other proposals to implement vm-snapshot?

See the discussion by Paolo and Andrea about post-copy migration, which
adds kernel memory management features for tracking userspace page
faults.  Perhaps you can use that infrastructure to trap guest writes.

Stefan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC]VM live snapshot proposal
  2014-03-04  8:54     ` Stefan Hajnoczi
@ 2014-03-04  9:05       ` Paolo Bonzini
  2014-03-04 11:28         ` Wenchao Xia
  2014-03-05  0:46       ` Huangpeng (Peter)
  1 sibling, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2014-03-04  9:05 UTC (permalink / raw)
  To: Stefan Hajnoczi, Huangpeng (Peter)
  Cc: kwolf@redhat.com, Pavel Hrdina, Zhanghailiang,
	KVM devel mailing list, qemu-devel@nongnu.org, Wenchao Xia

Il 04/03/2014 09:54, Stefan Hajnoczi ha scritto:
>> Is there any other proposals to implement vm-snapshot?
> See the discussion by Paolo and Andrea about post-copy migration, which
> adds kernel memory management features for tracking userspace page
> faults.  Perhaps you can use that infrastructure to trap guest writes.

That infrastructure actually traps guest reads too.  But it's fine, as 
they are a superset of guest writes and the image will still be consistent.

Paolo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC]VM live snapshot proposal
  2014-03-04  9:05       ` Paolo Bonzini
@ 2014-03-04 11:28         ` Wenchao Xia
  0 siblings, 0 replies; 26+ messages in thread
From: Wenchao Xia @ 2014-03-04 11:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kwolf@redhat.com, Pavel Hrdina, Zhanghailiang,
	KVM devel mailing list, Stefan Hajnoczi, Huangpeng (Peter),
	qemu-devel@nongnu.org, Wenchao Xia

于 2014/3/4 17:05, Paolo Bonzini 写道:
> Il 04/03/2014 09:54, Stefan Hajnoczi ha scritto:
>>> Is there any other proposals to implement vm-snapshot?
>> See the discussion by Paolo and Andrea about post-copy migration, which
>> adds kernel memory management features for tracking userspace page
>> faults. Perhaps you can use that infrastructure to trap guest writes.
>
> That infrastructure actually traps guest reads too. But it's fine, as 
> they are a superset of guest writes and the image will still be 
> consistent.
>
> Paolo
>
I heard that Kernel going to have API to let userspace catch memory 
operation, which originally
can be only caught by kernel code. I am not sure how it is now, but if 
kernel have it, qemu
can use it more gracefully than modifing migration code.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [RFC]VM live snapshot proposal
  2014-03-04  8:54     ` Stefan Hajnoczi
  2014-03-04  9:05       ` Paolo Bonzini
@ 2014-03-05  0:46       ` Huangpeng (Peter)
  1 sibling, 0 replies; 26+ messages in thread
From: Huangpeng (Peter) @ 2014-03-05  0:46 UTC (permalink / raw)
  To: Stefan Hajnoczi, Paolo Bonzini
  Cc: kwolf@redhat.com, Pavel Hrdina, Zhanghailiang,
	KVM devel mailing list, qemu-devel@nongnu.org, Wenchao Xia

> > Is there any other proposals to implement vm-snapshot?
> 
> See the discussion by Paolo and Andrea about post-copy migration, which adds
> kernel memory management features for tracking userspace page faults.
> Perhaps you can use that infrastructure to trap guest writes.
> 
> Stefan

I will look into Paolo's new infrastructure first, and post new progress later.
Thanks

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2014-03-06  9:15 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-03  1:13 [Qemu-devel] [RFC]VM live snapshot proposal Huangpeng (Peter)
2014-03-03 12:32 ` Stefan Hajnoczi
2014-03-03 12:55   ` Kevin Wolf
2014-03-03 13:19     ` Paolo Bonzini
2014-03-03 13:30       ` Kevin Wolf
2014-03-03 13:47         ` Paolo Bonzini
2014-03-03 14:04           ` Kevin Wolf
2014-03-03 14:55           ` Dr. David Alan Gilbert
2014-03-03 19:52           ` Andrea Arcangeli
2014-03-04  1:35             ` Huangpeng (Peter)
2014-03-05 14:46               ` Andrea Arcangeli
2014-03-05  1:52             ` Huangpeng (Peter)
2014-03-05 14:55               ` Andrea Arcangeli
2014-03-04  1:28         ` Huangpeng (Peter)
2014-03-04  9:40           ` Dr. David Alan Gilbert
2014-03-05  1:00             ` Huangpeng (Peter)
2014-03-05  9:09               ` Paolo Bonzini
2014-03-06  1:42             ` Huangpeng (Peter)
2014-03-06  9:14               ` Dr. David Alan Gilbert
2014-03-04  1:06     ` Huangpeng (Peter)
2014-03-03 13:18   ` Paolo Bonzini
2014-03-04  1:02   ` Huangpeng (Peter)
2014-03-04  8:54     ` Stefan Hajnoczi
2014-03-04  9:05       ` Paolo Bonzini
2014-03-04 11:28         ` Wenchao Xia
2014-03-05  0:46       ` Huangpeng (Peter)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).