* [Qemu-devel] [RFC]VM live snapshot proposal @ 2014-03-03 1:13 Huangpeng (Peter) 2014-03-03 12:32 ` Stefan Hajnoczi 0 siblings, 1 reply; 26+ messages in thread From: Huangpeng (Peter) @ 2014-03-03 1:13 UTC (permalink / raw) To: Paolo Bonzini, kwolf@redhat.com, stefanha@gmail.com, qemu-devel@nongnu.org, Wenchao Xia, Pavel Hrdina, KVM devel mailing list Cc: Zhanghailiang Hi, All I found some discussion about VM live-snapshot, but haven't seen any progress. https://lists.gnu.org/archive/html/qemu-devel/2013-08/msg02125.html http://markmail.org/thread/shneezha7kmtosvb#query:+page:1+mid:shneezha7kmtosvb+state:results Here I have another proposal, based on the live-migration scheme, add consistent memory state tracking and saving. The idea is simple: 1.First round use live-migration to save all memory to a snapshot file. 2.intercept the action of memory-modify, save old pages to a temporary file and mark dirty-bits, 3.Merge temporary file to the original snapshot file Detailed process: (1)Pause VM (2) Save the device status to a temporary file (live-migration already supported ) (3) Make disk snapshot (4) Enable page dirty log and old dirty pages save function(which we need to add) (5) Resume VM (6) Begin the first round of iteration, we save the entire contents of the VM memory pages to the snapshot file (7) In the second round of iteration , we save the old page to the snapshot file (8) Merge data of device status which is pre-saved in temporary files to the snapshot file (8) End ram snapshot and some cleanup work Due to memory-modifications may happen in kvm, qemu, or vhost, the key-part is how we can provide common page-modify-tracking-and-saving api, we completed a prototype by simply add modified-page tracking/saving function in qemu, and it seems worked fine. Is this program acceptable? or are there any other better suggestions? Thanks Peter Huang ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [RFC]VM live snapshot proposal 2014-03-03 1:13 [Qemu-devel] [RFC]VM live snapshot proposal Huangpeng (Peter) @ 2014-03-03 12:32 ` Stefan Hajnoczi 2014-03-03 12:55 ` Kevin Wolf ` (2 more replies) 0 siblings, 3 replies; 26+ messages in thread From: Stefan Hajnoczi @ 2014-03-03 12:32 UTC (permalink / raw) To: Huangpeng (Peter) Cc: kwolf@redhat.com, Pavel Hrdina, Zhanghailiang, KVM devel mailing list, qemu-devel@nongnu.org, Paolo Bonzini, Wenchao Xia On Mon, Mar 03, 2014 at 01:13:41AM +0000, Huangpeng (Peter) wrote: Just to summarize the idea of live savevm for people joining the discussion: It should be possible to save a snapshot of the guest (including memory, devices, and disk) without noticable downtime. The 'savevm' command pauses the guest until the snapshot has been completed and therefore doesn't meet the requirements. > Here I have another proposal, based on the live-migration scheme, add consistent > memory state tracking and saving. > The idea is simple: > 1.First round use live-migration to save all memory to a snapshot file. > 2.intercept the action of memory-modify, save old pages to a temporary file and mark dirty-bits, > 3.Merge temporary file to the original snapshot file > > Detailed process: > (1)Pause VM > (2) Save the device status to a temporary file (live-migration already supported ) > (3) Make disk snapshot > (4) Enable page dirty log and old dirty pages save function(which we need to add) > (5) Resume VM > (6) Begin the first round of iteration, we save the entire contents of the VM memory pages > to the snapshot file > (7) In the second round of iteration , we save the old page to the snapshot file > (8) Merge data of device status which is pre-saved in temporary files to the snapshot file > (8) End ram snapshot and some cleanup work > > Due to memory-modifications may happen in kvm, qemu, or vhost, the key-part is how we > can provide common page-modify-tracking-and-saving api, we completed a prototype by > simply add modified-page tracking/saving function in qemu, and it seems worked fine. Yes, this is the tricky part. To be honest, I think this is the reason no one has submitted patches - it's a hard task and the win isn't that great (you can already migrate to file). But back to the options: If the host has enough free memory to fork QEMU, a small helper process can be used to save the copy-on-write memory snapshot (thanks to fork(2) semantics). The hard part about the fork(2) approach is that QEMU isn't really designed to fork, so work is necessary to reach a quiescent state for the child process. If there is not enough memory to fork, then a synchronous approach to catching guest memory writes is needed. I'm not sure if a good mechanism for that exists but the simplest would be mprotect(2) and a signal handler (which will make the guest run very slowly). Stefan ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [RFC]VM live snapshot proposal 2014-03-03 12:32 ` Stefan Hajnoczi @ 2014-03-03 12:55 ` Kevin Wolf 2014-03-03 13:19 ` Paolo Bonzini 2014-03-04 1:06 ` Huangpeng (Peter) 2014-03-03 13:18 ` Paolo Bonzini 2014-03-04 1:02 ` Huangpeng (Peter) 2 siblings, 2 replies; 26+ messages in thread From: Kevin Wolf @ 2014-03-03 12:55 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Pavel Hrdina, Zhanghailiang, KVM devel mailing list, Huangpeng (Peter), qemu-devel@nongnu.org, Paolo Bonzini, Wenchao Xia Am 03.03.2014 um 13:32 hat Stefan Hajnoczi geschrieben: > On Mon, Mar 03, 2014 at 01:13:41AM +0000, Huangpeng (Peter) wrote: > > Just to summarize the idea of live savevm for people joining the > discussion: > > It should be possible to save a snapshot of the guest (including memory, > devices, and disk) without noticable downtime. > > The 'savevm' command pauses the guest until the snapshot has been > completed and therefore doesn't meet the requirements. > > > Here I have another proposal, based on the live-migration scheme, add consistent > > memory state tracking and saving. > > The idea is simple: > > 1.First round use live-migration to save all memory to a snapshot file. > > 2.intercept the action of memory-modify, save old pages to a temporary file and mark dirty-bits, > > 3.Merge temporary file to the original snapshot file Why do you need a temporary file for this? Couldn't you directly store the memory to its final destination in the snapshot file? > > Detailed process: > > (1)Pause VM > > (2) Save the device status to a temporary file (live-migration already supported ) > > (3) Make disk snapshot > > (4) Enable page dirty log and old dirty pages save function(which we need to add) > > (5) Resume VM > > (6) Begin the first round of iteration, we save the entire contents of the VM memory pages > > to the snapshot file > > (7) In the second round of iteration , we save the old page to the snapshot file > > (8) Merge data of device status which is pre-saved in temporary files to the snapshot file > > (8) End ram snapshot and some cleanup work > > > > Due to memory-modifications may happen in kvm, qemu, or vhost, the key-part is how we > > can provide common page-modify-tracking-and-saving api, we completed a prototype by > > simply add modified-page tracking/saving function in qemu, and it seems worked fine. > > Yes, this is the tricky part. To be honest, I think this is the reason > no one has submitted patches - it's a hard task and the win isn't that > great (you can already migrate to file). So why don't we simply reuse the existing migration code? Kevin ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [RFC]VM live snapshot proposal 2014-03-03 12:55 ` Kevin Wolf @ 2014-03-03 13:19 ` Paolo Bonzini 2014-03-03 13:30 ` Kevin Wolf 2014-03-04 1:06 ` Huangpeng (Peter) 1 sibling, 1 reply; 26+ messages in thread From: Paolo Bonzini @ 2014-03-03 13:19 UTC (permalink / raw) To: Kevin Wolf, Stefan Hajnoczi Cc: Pavel Hrdina, Zhanghailiang, KVM devel mailing list, Huangpeng (Peter), qemu-devel@nongnu.org, Wenchao Xia Il 03/03/2014 13:55, Kevin Wolf ha scritto: >>> > > Due to memory-modifications may happen in kvm, qemu, or vhost, the key-part is how we >>> > > can provide common page-modify-tracking-and-saving api, we completed a prototype by >>> > > simply add modified-page tracking/saving function in qemu, and it seems worked fine. >> > >> > Yes, this is the tricky part. To be honest, I think this is the reason >> > no one has submitted patches - it's a hard task and the win isn't that >> > great (you can already migrate to file). > So why don't we simply reuse the existing migration code? I think this is different in the same way that block-backup and block-mirror are different. Huangpeng's proposal would let you make a consistent snapshot of disks and RAM. Paolo ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [RFC]VM live snapshot proposal 2014-03-03 13:19 ` Paolo Bonzini @ 2014-03-03 13:30 ` Kevin Wolf 2014-03-03 13:47 ` Paolo Bonzini 2014-03-04 1:28 ` Huangpeng (Peter) 0 siblings, 2 replies; 26+ messages in thread From: Kevin Wolf @ 2014-03-03 13:30 UTC (permalink / raw) To: Paolo Bonzini Cc: Pavel Hrdina, Zhanghailiang, KVM devel mailing list, Stefan Hajnoczi, Huangpeng (Peter), qemu-devel@nongnu.org, Wenchao Xia Am 03.03.2014 um 14:19 hat Paolo Bonzini geschrieben: > Il 03/03/2014 13:55, Kevin Wolf ha scritto: > >>>> > Due to memory-modifications may happen in kvm, qemu, or vhost, the key-part is how we > >>>> > can provide common page-modify-tracking-and-saving api, we completed a prototype by > >>>> > simply add modified-page tracking/saving function in qemu, and it seems worked fine. > >>> > >>> Yes, this is the tricky part. To be honest, I think this is the reason > >>> no one has submitted patches - it's a hard task and the win isn't that > >>> great (you can already migrate to file). > >So why don't we simply reuse the existing migration code? > > I think this is different in the same way that block-backup and > block-mirror are different. Huangpeng's proposal would let you make > a consistent snapshot of disks and RAM. Right. Though the point isn't about consistency (doing the disk snapshot when memory has converged would be consistent as well), but about having the snapshot semantically right at the time when the monitor command is issued instead of only starting it then and being consistent at the point of completion. This is indeed like pre/post-copy live migration, and probably both options have their uses. I would suggest starting with the easy one, and adding the post-copy feature on top. Kevin ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [RFC]VM live snapshot proposal 2014-03-03 13:30 ` Kevin Wolf @ 2014-03-03 13:47 ` Paolo Bonzini 2014-03-03 14:04 ` Kevin Wolf ` (2 more replies) 2014-03-04 1:28 ` Huangpeng (Peter) 1 sibling, 3 replies; 26+ messages in thread From: Paolo Bonzini @ 2014-03-03 13:47 UTC (permalink / raw) To: Kevin Wolf Cc: Andrea Arcangeli, Pavel Hrdina, Zhanghailiang, KVM devel mailing list, Stefan Hajnoczi, Huangpeng (Peter), qemu-devel@nongnu.org, Wenchao Xia Il 03/03/2014 14:30, Kevin Wolf ha scritto: > > > So why don't we simply reuse the existing migration code? > > I think this is different in the same way that block-backup and > > block-mirror are different. Huangpeng's proposal would let you make > > a consistent snapshot of disks and RAM. > Right. Though the point isn't about consistency (doing the disk snapshot > when memory has converged would be consistent as well), but about > having the snapshot semantically right at the time when the monitor > command is issued instead of only starting it then and being consistent > at the point of completion. Right---though it's not entirely true that migration only affects the point in time where you have consistency. For example, with migration you cannot use the guest agent for freeze/thaw and, even if we changed the code to allow that, the pause would be much longer than for live snapshots or block-backup. > This is indeed like pre/post-copy live migration, and probably both > options have their uses. I would suggest starting with the easy one, and > adding the post-copy feature on top. The feature matrix for migration and snapshot disk RAM internal snapshot non-live yes (0) yes (0) yes live, disk only yes (1) N/A yes (2) live, pre-copy yes (3) yes no live, post-copy yes (4) no no live, point-in-time yes (5) no no (0) just stop VM while doing normal pre-copy migration (1) blockdev-snapshot-sync (2) blockdev-snapshot-internal-sync (3) block-stream (4) drive-mirror (5) drive-backup By "the easy one" you mean live savevm with snapshot at the end of RAM migration, I guess. But the functionality is already available using migration, while point-in-time snapshots actually add new functionality. I'm not sure what's the status of the kernel infrastructure for post-copy. Andrea? Paolo ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [RFC]VM live snapshot proposal 2014-03-03 13:47 ` Paolo Bonzini @ 2014-03-03 14:04 ` Kevin Wolf 2014-03-03 14:55 ` Dr. David Alan Gilbert 2014-03-03 19:52 ` Andrea Arcangeli 2 siblings, 0 replies; 26+ messages in thread From: Kevin Wolf @ 2014-03-03 14:04 UTC (permalink / raw) To: Paolo Bonzini Cc: Andrea Arcangeli, Pavel Hrdina, Zhanghailiang, KVM devel mailing list, Stefan Hajnoczi, Huangpeng (Peter), qemu-devel@nongnu.org, Wenchao Xia Am 03.03.2014 um 14:47 hat Paolo Bonzini geschrieben: > Il 03/03/2014 14:30, Kevin Wolf ha scritto: > >> > So why don't we simply reuse the existing migration code? > >> I think this is different in the same way that block-backup and > >> block-mirror are different. Huangpeng's proposal would let you make > >> a consistent snapshot of disks and RAM. > >Right. Though the point isn't about consistency (doing the disk snapshot > >when memory has converged would be consistent as well), but about > >having the snapshot semantically right at the time when the monitor > >command is issued instead of only starting it then and being consistent > >at the point of completion. > > Right---though it's not entirely true that migration only affects > the point in time where you have consistency. For example, with > migration you cannot use the guest agent for freeze/thaw and, even > if we changed the code to allow that, the pause would be much longer > than for live snapshots or block-backup. > > >This is indeed like pre/post-copy live migration, and probably both > >options have their uses. I would suggest starting with the easy one, and > >adding the post-copy feature on top. > > The feature matrix for migration and snapshot > > disk RAM internal snapshot > non-live yes (0) yes (0) yes > live, disk only yes (1) N/A yes (2) > live, pre-copy yes (3) yes no > live, post-copy yes (4) no no > live, point-in-time yes (5) no no > > (0) just stop VM while doing normal pre-copy migration > (1) blockdev-snapshot-sync > (2) blockdev-snapshot-internal-sync > (3) block-stream > (4) drive-mirror > (5) drive-backup > > By "the easy one" you mean live savevm with snapshot at the end of > RAM migration, I guess. But the functionality is already available > using migration, while point-in-time snapshots actually add new > functionality. I'm not sure what's the status of the kernel > infrastructure for post-copy. Andrea? Yes, it's available, but not with internal snapshots, but only with RAM snapshots stored in an external file. An incremental next step would be to avoid writing dirtied memory to two places, because internal snapshots aren't a streaming, but a random access interface, so you can overwrite the original place instead of appending the new copy. That would already be a small advantage. Once you have this infrastructure, it's probably also a bit easier to plug in any post-copy/point-in-time features that the migration code can (be improved to) provide. Kevin ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [RFC]VM live snapshot proposal 2014-03-03 13:47 ` Paolo Bonzini 2014-03-03 14:04 ` Kevin Wolf @ 2014-03-03 14:55 ` Dr. David Alan Gilbert 2014-03-03 19:52 ` Andrea Arcangeli 2 siblings, 0 replies; 26+ messages in thread From: Dr. David Alan Gilbert @ 2014-03-03 14:55 UTC (permalink / raw) To: Paolo Bonzini Cc: Kevin Wolf, Andrea Arcangeli, Pavel Hrdina, Zhanghailiang, KVM devel mailing list, Stefan Hajnoczi, Huangpeng (Peter), mrhines, qemu-devel@nongnu.org, Wenchao Xia * Paolo Bonzini (pbonzini@redhat.com) wrote: > Il 03/03/2014 14:30, Kevin Wolf ha scritto: > >> > So why don't we simply reuse the existing migration code? > >> I think this is different in the same way that block-backup and > >> block-mirror are different. Huangpeng's proposal would let you make > >> a consistent snapshot of disks and RAM. > >Right. Though the point isn't about consistency (doing the disk snapshot > >when memory has converged would be consistent as well), but about > >having the snapshot semantically right at the time when the monitor > >command is issued instead of only starting it then and being consistent > >at the point of completion. > > Right---though it's not entirely true that migration only affects > the point in time where you have consistency. For example, with > migration you cannot use the guest agent for freeze/thaw and, even > if we changed the code to allow that, the pause would be much longer > than for live snapshots or block-backup. > > >This is indeed like pre/post-copy live migration, and probably both > >options have their uses. I would suggest starting with the easy one, and > >adding the post-copy feature on top. > > The feature matrix for migration and snapshot > > disk RAM internal snapshot > non-live yes (0) yes (0) yes > live, disk only yes (1) N/A yes (2) > live, pre-copy yes (3) yes no > live, post-copy yes (4) no no > live, point-in-time yes (5) no no > > (0) just stop VM while doing normal pre-copy migration > (1) blockdev-snapshot-sync > (2) blockdev-snapshot-internal-sync > (3) block-stream > (4) drive-mirror > (5) drive-backup > > By "the easy one" you mean live savevm with snapshot at the end of > RAM migration, I guess. But the functionality is already available > using migration, while point-in-time snapshots actually add new > functionality. I'm not sure what's the status of the kernel > infrastructure for post-copy. Andrea? Accumulating the running set of changes that migration is spitting out gets you some of the way - but to do it you have to have points in the migration stream which represent a consistent view of device state, RAM and disk and I think the tricky point is getting those consistent points; while the CPU is running the set of pages that migration spits out are certainly newer than old versions of the pages but I don't think you can just put a marker in and say that the point represents a single consistent view of RAM. In many ways this is the opposite of Michael Hines's microcheckpointing approach; which stops everything and takes the snapshot regularly; I did suggest a modification to that would be to COW those checkpoints. Dave -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [RFC]VM live snapshot proposal 2014-03-03 13:47 ` Paolo Bonzini 2014-03-03 14:04 ` Kevin Wolf 2014-03-03 14:55 ` Dr. David Alan Gilbert @ 2014-03-03 19:52 ` Andrea Arcangeli 2014-03-04 1:35 ` Huangpeng (Peter) 2014-03-05 1:52 ` Huangpeng (Peter) 2 siblings, 2 replies; 26+ messages in thread From: Andrea Arcangeli @ 2014-03-03 19:52 UTC (permalink / raw) To: Paolo Bonzini Cc: Kevin Wolf, Pavel Hrdina, Zhanghailiang, KVM devel mailing list, Stefan Hajnoczi, Huangpeng (Peter), qemu-devel@nongnu.org, Wenchao Xia Hi Paolo, On Mon, Mar 03, 2014 at 02:47:31PM +0100, Paolo Bonzini wrote: > I'm not sure what's the status of the kernel infrastructure for > post-copy. Andrea? sys_userfaultfd is still work in progress but it shouldn't be much work left to completion. madvise(MADV_USERFAULT) and remap_anon_pages() are complete for a while. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [RFC]VM live snapshot proposal 2014-03-03 19:52 ` Andrea Arcangeli @ 2014-03-04 1:35 ` Huangpeng (Peter) 2014-03-05 14:46 ` Andrea Arcangeli 2014-03-05 1:52 ` Huangpeng (Peter) 1 sibling, 1 reply; 26+ messages in thread From: Huangpeng (Peter) @ 2014-03-04 1:35 UTC (permalink / raw) To: Andrea Arcangeli, Paolo Bonzini Cc: Kevin Wolf, Pavel Hrdina, Zhanghailiang, KVM devel mailing list, Stefan Hajnoczi, qemu-devel@nongnu.org, Wenchao Xia > Hi Paolo, > > On Mon, Mar 03, 2014 at 02:47:31PM +0100, Paolo Bonzini wrote: > > I'm not sure what's the status of the kernel infrastructure for > > post-copy. Andrea? > > sys_userfaultfd is still work in progress but it shouldn't be much work left to > completion. madvise(MADV_USERFAULT) and > remap_anon_pages() are complete for a while. http://qemu-project.org/Features/PostCopyLiveMigration From the feature description, post-copy uses memory copy, so this infrastructure will solve this problem, but do not help snapshot, am I right? Thansk ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [RFC]VM live snapshot proposal 2014-03-04 1:35 ` Huangpeng (Peter) @ 2014-03-05 14:46 ` Andrea Arcangeli 0 siblings, 0 replies; 26+ messages in thread From: Andrea Arcangeli @ 2014-03-05 14:46 UTC (permalink / raw) To: Huangpeng (Peter) Cc: Kevin Wolf, Pavel Hrdina, Zhanghailiang, KVM devel mailing list, Stefan Hajnoczi, qemu-devel@nongnu.org, Paolo Bonzini, Wenchao Xia Hi, On Tue, Mar 04, 2014 at 01:35:53AM +0000, Huangpeng (Peter) wrote: > > > Hi Paolo, > > > > On Mon, Mar 03, 2014 at 02:47:31PM +0100, Paolo Bonzini wrote: > > > I'm not sure what's the status of the kernel infrastructure for > > > post-copy. Andrea? > > > > sys_userfaultfd is still work in progress but it shouldn't be much work left to > > completion. madvise(MADV_USERFAULT) and > > remap_anon_pages() are complete for a while. > > http://qemu-project.org/Features/PostCopyLiveMigration > From the feature description, post-copy uses memory copy, so this infrastructure > will solve this problem, but do not help snapshot, am I right? Correct there's no copy with this infrastructure, other than whatever data copy that may be happening inside the network receive protocol for skb linearization into userland memory. With RDMA or zerocopy DMA receive mechanisms, there may be no copy at all. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [RFC]VM live snapshot proposal 2014-03-03 19:52 ` Andrea Arcangeli 2014-03-04 1:35 ` Huangpeng (Peter) @ 2014-03-05 1:52 ` Huangpeng (Peter) 2014-03-05 14:55 ` Andrea Arcangeli 1 sibling, 1 reply; 26+ messages in thread From: Huangpeng (Peter) @ 2014-03-05 1:52 UTC (permalink / raw) To: Andrea Arcangeli, Paolo Bonzini Cc: Kevin Wolf, Pavel Hrdina, Zhanghailiang, KVM devel mailing list, Stefan Hajnoczi, qemu-devel@nongnu.org, Wenchao Xia Hi, Andrea Where can I get the dev-git-branch? I can use it to try the snapshot prototype coding. Thanks. > -----Original Message----- > From: Andrea Arcangeli [mailto:aarcange@redhat.com] > Sent: Tuesday, March 04, 2014 3:52 AM > To: Paolo Bonzini > Cc: Kevin Wolf; Stefan Hajnoczi; Huangpeng (Peter); qemu-devel@nongnu.org; > Wenchao Xia; Pavel Hrdina; KVM devel mailing list; Zhanghailiang > Subject: Re: [RFC]VM live snapshot proposal > > Hi Paolo, > > On Mon, Mar 03, 2014 at 02:47:31PM +0100, Paolo Bonzini wrote: > > I'm not sure what's the status of the kernel infrastructure for > > post-copy. Andrea? > > sys_userfaultfd is still work in progress but it shouldn't be much work left to > completion. madvise(MADV_USERFAULT) and > remap_anon_pages() are complete for a while. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [RFC]VM live snapshot proposal 2014-03-05 1:52 ` Huangpeng (Peter) @ 2014-03-05 14:55 ` Andrea Arcangeli 0 siblings, 0 replies; 26+ messages in thread From: Andrea Arcangeli @ 2014-03-05 14:55 UTC (permalink / raw) To: Huangpeng (Peter) Cc: Kevin Wolf, Pavel Hrdina, Zhanghailiang, KVM devel mailing list, Stefan Hajnoczi, qemu-devel@nongnu.org, Paolo Bonzini, Wenchao Xia On Wed, Mar 05, 2014 at 01:52:14AM +0000, Huangpeng (Peter) wrote: > Hi, Andrea > > Where can I get the dev-git-branch? > I can use it to try the snapshot prototype coding. You can find the current status in the origin/master branch here http://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git however userlandfd is still missing so it's not yet good for transparent userfault when it's O_DIRECT or other gup users triggering the access (those would currently return an error to userland if they hit on a userfault vma, and we don't want to change userland to ever get an error or the modifications to userland are too big). userlandfd will let the kernel wait on an event from the migration thread and it will talk with the migration thread directly. So userland won't be able to notice the userfault happening inside a write() or kvm ioctl() syscall (you could notice only if you strace the migration thread). That's more efficient too so the host scheduler can directly switch to the migration thread without having to return to userland first. And after remap_anon_pages completes and the host scheduler runs the vcpu or I/O thread again, gup_fast can continue from kernel mode where it stopped again without unnecessary exits to userland. Making the kernel speak directly to the migration thread is somewhat more tricky at the kernel level that what you find in aa.git right now, but it is worth it to be transparent to all syscalls that would trip on userfaults with gup_fast. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [RFC]VM live snapshot proposal 2014-03-03 13:30 ` Kevin Wolf 2014-03-03 13:47 ` Paolo Bonzini @ 2014-03-04 1:28 ` Huangpeng (Peter) 2014-03-04 9:40 ` Dr. David Alan Gilbert 1 sibling, 1 reply; 26+ messages in thread From: Huangpeng (Peter) @ 2014-03-04 1:28 UTC (permalink / raw) To: Kevin Wolf, Paolo Bonzini Cc: Pavel Hrdina, Zhanghailiang, KVM devel mailing list, Stefan Hajnoczi, qemu-devel@nongnu.org, Wenchao Xia > > I think this is different in the same way that block-backup and > > block-mirror are different. Huangpeng's proposal would let you make a > > consistent snapshot of disks and RAM. > > Right. Though the point isn't about consistency (doing the disk snapshot when > memory has converged would be consistent as well), but about having the > snapshot semantically right at the time when the monitor command is issued > instead of only starting it then and being consistent at the point of completion. > > This is indeed like pre/post-copy live migration, and probably both options have > their uses. I would suggest starting with the easy one, and adding the > post-copy feature on top. > Good suggestion, The latest patches of post-copy seems updated 2 years ago. https://github.com/yamahata/qemu One question: Can post-copy fallback if exceptions happen during post-copy? Thanks ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [RFC]VM live snapshot proposal 2014-03-04 1:28 ` Huangpeng (Peter) @ 2014-03-04 9:40 ` Dr. David Alan Gilbert 2014-03-05 1:00 ` Huangpeng (Peter) 2014-03-06 1:42 ` Huangpeng (Peter) 0 siblings, 2 replies; 26+ messages in thread From: Dr. David Alan Gilbert @ 2014-03-04 9:40 UTC (permalink / raw) To: Huangpeng (Peter) Cc: Kevin Wolf, Pavel Hrdina, Zhanghailiang, KVM devel mailing list, Stefan Hajnoczi, qemu-devel@nongnu.org, Paolo Bonzini, Wenchao Xia * Huangpeng (Peter) (peter.huangpeng@huawei.com) wrote: > > > I think this is different in the same way that block-backup and > > > block-mirror are different. Huangpeng's proposal would let you make a > > > consistent snapshot of disks and RAM. > > > > Right. Though the point isn't about consistency (doing the disk snapshot when > > memory has converged would be consistent as well), but about having the > > snapshot semantically right at the time when the monitor command is issued > > instead of only starting it then and being consistent at the point of completion. > > > > This is indeed like pre/post-copy live migration, and probably both options have > > their uses. I would suggest starting with the easy one, and adding the > > post-copy feature on top. > > > > Good suggestion, The latest patches of post-copy seems updated 2 years ago. > https://github.com/yamahata/qemu I'm working on post-copy at the moment, using Andrea's kernel code, using bits of Yamahata's code base as well; hopefully it won't be too long until we have something to post. > One question: > Can post-copy fallback if exceptions happen during post-copy? What do you mean by 'exceptions' here? Generally postcopy can't fall back to precopy because once you're in postcopy mode the state is split between the two machines. Dave -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [RFC]VM live snapshot proposal 2014-03-04 9:40 ` Dr. David Alan Gilbert @ 2014-03-05 1:00 ` Huangpeng (Peter) 2014-03-05 9:09 ` Paolo Bonzini 2014-03-06 1:42 ` Huangpeng (Peter) 1 sibling, 1 reply; 26+ messages in thread From: Huangpeng (Peter) @ 2014-03-05 1:00 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Kevin Wolf, Pavel Hrdina, Zhanghailiang, KVM devel mailing list, Stefan Hajnoczi, qemu-devel@nongnu.org, Paolo Bonzini, Wenchao Xia > > Good suggestion, The latest patches of post-copy seems updated 2 years > ago. > > https://github.com/yamahata/qemu > > I'm working on post-copy at the moment, using Andrea's kernel code, using bits > of Yamahata's code base as well; hopefully it won't be too long until we have > something to post. > > > One question: > > Can post-copy fallback if exceptions happen during post-copy? > > What do you mean by 'exceptions' here? Generally postcopy can't fall back to > precopy because once you're in postcopy mode the state is split between the > two machines. > Like destination VM interrupted due to memory-copy error or other exceptions, with pre-copy scheme, we can fall-back to the source-vm. One simple question(may be discussed before), what kind of scenario does post-copy aim for? ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [RFC]VM live snapshot proposal 2014-03-05 1:00 ` Huangpeng (Peter) @ 2014-03-05 9:09 ` Paolo Bonzini 0 siblings, 0 replies; 26+ messages in thread From: Paolo Bonzini @ 2014-03-05 9:09 UTC (permalink / raw) To: Huangpeng (Peter), Dr. David Alan Gilbert Cc: Kevin Wolf, Pavel Hrdina, Zhanghailiang, KVM devel mailing list, Stefan Hajnoczi, qemu-devel@nongnu.org, Wenchao Xia Il 05/03/2014 02:00, Huangpeng (Peter) ha scritto: >>> One question: >>> Can post-copy fallback if exceptions happen during post-copy? >> >> What do you mean by 'exceptions' here? Generally postcopy can't fall back to >> precopy because once you're in postcopy mode the state is split between the >> two machines. > > Like destination VM interrupted due to memory-copy error or other exceptions, > with pre-copy scheme, we can fall-back to the source-vm. No, postcopy cannot do that. However, this is a limitation of postcopy, not of the kernel interfaces that Andrea is adding. If you use those interfaces to implement live VM point-in-time snapshots, you can drop the snapshotting operation safely and keep the VM running. > One simple question(may be discussed before), what kind of scenario does post-copy > aim for? Mostly cases where pre-copy migration doesn't converge because the guest is too big, or when you require a really, really small downtime. It can be useful when you have to evacuate a host as fast as possible (due to detecting an intrusion or impending hardware failure), because the alternative is to shutdown the VM immediately. It can also be used for upgrading QEMU on the host where the VM is running; in this case you can use a Unix socket for transport, and eliminate the chance of migration failing due to a network problem. Paolo ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [RFC]VM live snapshot proposal 2014-03-04 9:40 ` Dr. David Alan Gilbert 2014-03-05 1:00 ` Huangpeng (Peter) @ 2014-03-06 1:42 ` Huangpeng (Peter) 2014-03-06 9:14 ` Dr. David Alan Gilbert 1 sibling, 1 reply; 26+ messages in thread From: Huangpeng (Peter) @ 2014-03-06 1:42 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Kevin Wolf, Pavel Hrdina, Zhanghailiang, KVM devel mailing list, Stefan Hajnoczi, qemu-devel@nongnu.org, Paolo Bonzini, Wenchao Xia Hi David Where can I get your post-copy git tree? I wish to take a look into it first before start live-snapshot design. Thanks. > I'm working on post-copy at the moment, using Andrea's kernel code, using bits > of Yamahata's code base as well; hopefully it won't be too long until we have > something to post. > > > One question: > > Can post-copy fallback if exceptions happen during post-copy? > > What do you mean by 'exceptions' here? Generally postcopy can't fall back to > precopy because once you're in postcopy mode the state is split between the > two machines. > > Dave > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [RFC]VM live snapshot proposal 2014-03-06 1:42 ` Huangpeng (Peter) @ 2014-03-06 9:14 ` Dr. David Alan Gilbert 0 siblings, 0 replies; 26+ messages in thread From: Dr. David Alan Gilbert @ 2014-03-06 9:14 UTC (permalink / raw) To: Huangpeng (Peter) Cc: Kevin Wolf, Pavel Hrdina, Zhanghailiang, KVM devel mailing list, Stefan Hajnoczi, qemu-devel@nongnu.org, Paolo Bonzini, Wenchao Xia * Huangpeng (Peter) (peter.huangpeng@huawei.com) wrote: > Hi David > > Where can I get your post-copy git tree? > I wish to take a look into it first before start live-snapshot design. It's not yet published, as soon as it shows signs of life and I tidy it up I'll get it out there. Dave -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [RFC]VM live snapshot proposal 2014-03-03 12:55 ` Kevin Wolf 2014-03-03 13:19 ` Paolo Bonzini @ 2014-03-04 1:06 ` Huangpeng (Peter) 1 sibling, 0 replies; 26+ messages in thread From: Huangpeng (Peter) @ 2014-03-04 1:06 UTC (permalink / raw) To: Kevin Wolf, Stefan Hajnoczi Cc: Pavel Hrdina, Zhanghailiang, KVM devel mailing list, qemu-devel@nongnu.org, Paolo Bonzini, Wenchao Xia > > > Here I have another proposal, based on the live-migration scheme, > > > add consistent memory state tracking and saving. > > > The idea is simple: > > > 1.First round use live-migration to save all memory to a snapshot file. > > > 2.intercept the action of memory-modify, save old pages to a > > > temporary file and mark dirty-bits, 3.Merge temporary file to the > > > original snapshot file > > Why do you need a temporary file for this? Couldn't you directly store the > memory to its final destination in the snapshot file? > Writing to the same snapshot file needs to consider about write protection, currently we implemented the prototype in the simplest way, and if this proposal is accepted we will consider about it. thanks. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [RFC]VM live snapshot proposal 2014-03-03 12:32 ` Stefan Hajnoczi 2014-03-03 12:55 ` Kevin Wolf @ 2014-03-03 13:18 ` Paolo Bonzini 2014-03-04 1:02 ` Huangpeng (Peter) 2 siblings, 0 replies; 26+ messages in thread From: Paolo Bonzini @ 2014-03-03 13:18 UTC (permalink / raw) To: Stefan Hajnoczi, Huangpeng (Peter) Cc: kwolf@redhat.com, Andrea Arcangeli, Pavel Hrdina, Zhanghailiang, KVM devel mailing list, qemu-devel@nongnu.org, Dr David Alan Gilbert, Wenchao Xia Il 03/03/2014 13:32, Stefan Hajnoczi ha scritto: > If there is not enough memory to fork, then a synchronous approach to > catching guest memory writes is needed. I'm not sure if a good > mechanism for that exists but the simplest would be mprotect(2) and a > signal handler (which will make the guest run very slowly). I think we'll be adding such a mechanism, but for guest memory reads, for postcopy migration. Perhaps it could be reused for live snapshotting? Paolo ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [RFC]VM live snapshot proposal 2014-03-03 12:32 ` Stefan Hajnoczi 2014-03-03 12:55 ` Kevin Wolf 2014-03-03 13:18 ` Paolo Bonzini @ 2014-03-04 1:02 ` Huangpeng (Peter) 2014-03-04 8:54 ` Stefan Hajnoczi 2 siblings, 1 reply; 26+ messages in thread From: Huangpeng (Peter) @ 2014-03-04 1:02 UTC (permalink / raw) To: Stefan Hajnoczi Cc: kwolf@redhat.com, Pavel Hrdina, Zhanghailiang, KVM devel mailing list, qemu-devel@nongnu.org, Paolo Bonzini, Wenchao Xia > Yes, this is the tricky part. To be honest, I think this is the reason no one has > submitted patches - it's a hard task and the win isn't that great (you can > already migrate to file). > Yes, lots of places have to be considered. Though scenarios are limited, users like library experiments may need to revert repeatedly to the same vm-state(memory state + disk state) . The key-part is tracking and saving the consistent state right on snapshot time, kvm/qemu/vhost have already implement dirty-tracking and my proposal will add common save-old-page apis to save the consistent state. Is this way right or do you have other suggestions? > But back to the options: > > If the host has enough free memory to fork QEMU, a small helper process can > be used to save the copy-on-write memory snapshot (thanks to fork(2) > semantics). The hard part about the fork(2) approach is that QEMU isn't > really designed to fork, so work is necessary to reach a quiescent state for the > child process. > > If there is not enough memory to fork, then a synchronous approach to > catching guest memory writes is needed. I'm not sure if a good mechanism > for that exists but the simplest would be mprotect(2) and a signal handler > (which will make the guest run very slowly). > > Stefan In real production environment, memory over-commit or use as much memory as possible may be the normal case, so the fork semantics cannot meet the needs. Is there any other proposals to implement vm-snapshot? Thanks. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [RFC]VM live snapshot proposal 2014-03-04 1:02 ` Huangpeng (Peter) @ 2014-03-04 8:54 ` Stefan Hajnoczi 2014-03-04 9:05 ` Paolo Bonzini 2014-03-05 0:46 ` Huangpeng (Peter) 0 siblings, 2 replies; 26+ messages in thread From: Stefan Hajnoczi @ 2014-03-04 8:54 UTC (permalink / raw) To: Huangpeng (Peter) Cc: kwolf@redhat.com, Pavel Hrdina, Zhanghailiang, KVM devel mailing list, qemu-devel@nongnu.org, Paolo Bonzini, Wenchao Xia On Tue, Mar 04, 2014 at 01:02:44AM +0000, Huangpeng (Peter) wrote: > > But back to the options: > > > > If the host has enough free memory to fork QEMU, a small helper process can > > be used to save the copy-on-write memory snapshot (thanks to fork(2) > > semantics). The hard part about the fork(2) approach is that QEMU isn't > > really designed to fork, so work is necessary to reach a quiescent state for the > > child process. > > > > If there is not enough memory to fork, then a synchronous approach to > > catching guest memory writes is needed. I'm not sure if a good mechanism > > for that exists but the simplest would be mprotect(2) and a signal handler > > (which will make the guest run very slowly). > > > > Stefan > > In real production environment, memory over-commit or use as much memory as > possible may be the normal case, so the fork semantics cannot meet the needs. Yes, I think you're right. The fork approach only works in the easy case where there is plenty of free host memory. > Is there any other proposals to implement vm-snapshot? See the discussion by Paolo and Andrea about post-copy migration, which adds kernel memory management features for tracking userspace page faults. Perhaps you can use that infrastructure to trap guest writes. Stefan ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [RFC]VM live snapshot proposal 2014-03-04 8:54 ` Stefan Hajnoczi @ 2014-03-04 9:05 ` Paolo Bonzini 2014-03-04 11:28 ` Wenchao Xia 2014-03-05 0:46 ` Huangpeng (Peter) 1 sibling, 1 reply; 26+ messages in thread From: Paolo Bonzini @ 2014-03-04 9:05 UTC (permalink / raw) To: Stefan Hajnoczi, Huangpeng (Peter) Cc: kwolf@redhat.com, Pavel Hrdina, Zhanghailiang, KVM devel mailing list, qemu-devel@nongnu.org, Wenchao Xia Il 04/03/2014 09:54, Stefan Hajnoczi ha scritto: >> Is there any other proposals to implement vm-snapshot? > See the discussion by Paolo and Andrea about post-copy migration, which > adds kernel memory management features for tracking userspace page > faults. Perhaps you can use that infrastructure to trap guest writes. That infrastructure actually traps guest reads too. But it's fine, as they are a superset of guest writes and the image will still be consistent. Paolo ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [RFC]VM live snapshot proposal 2014-03-04 9:05 ` Paolo Bonzini @ 2014-03-04 11:28 ` Wenchao Xia 0 siblings, 0 replies; 26+ messages in thread From: Wenchao Xia @ 2014-03-04 11:28 UTC (permalink / raw) To: Paolo Bonzini Cc: kwolf@redhat.com, Pavel Hrdina, Zhanghailiang, KVM devel mailing list, Stefan Hajnoczi, Huangpeng (Peter), qemu-devel@nongnu.org, Wenchao Xia 于 2014/3/4 17:05, Paolo Bonzini 写道: > Il 04/03/2014 09:54, Stefan Hajnoczi ha scritto: >>> Is there any other proposals to implement vm-snapshot? >> See the discussion by Paolo and Andrea about post-copy migration, which >> adds kernel memory management features for tracking userspace page >> faults. Perhaps you can use that infrastructure to trap guest writes. > > That infrastructure actually traps guest reads too. But it's fine, as > they are a superset of guest writes and the image will still be > consistent. > > Paolo > I heard that Kernel going to have API to let userspace catch memory operation, which originally can be only caught by kernel code. I am not sure how it is now, but if kernel have it, qemu can use it more gracefully than modifing migration code. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [RFC]VM live snapshot proposal 2014-03-04 8:54 ` Stefan Hajnoczi 2014-03-04 9:05 ` Paolo Bonzini @ 2014-03-05 0:46 ` Huangpeng (Peter) 1 sibling, 0 replies; 26+ messages in thread From: Huangpeng (Peter) @ 2014-03-05 0:46 UTC (permalink / raw) To: Stefan Hajnoczi, Paolo Bonzini Cc: kwolf@redhat.com, Pavel Hrdina, Zhanghailiang, KVM devel mailing list, qemu-devel@nongnu.org, Wenchao Xia > > Is there any other proposals to implement vm-snapshot? > > See the discussion by Paolo and Andrea about post-copy migration, which adds > kernel memory management features for tracking userspace page faults. > Perhaps you can use that infrastructure to trap guest writes. > > Stefan I will look into Paolo's new infrastructure first, and post new progress later. Thanks ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2014-03-06 9:15 UTC | newest] Thread overview: 26+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-03-03 1:13 [Qemu-devel] [RFC]VM live snapshot proposal Huangpeng (Peter) 2014-03-03 12:32 ` Stefan Hajnoczi 2014-03-03 12:55 ` Kevin Wolf 2014-03-03 13:19 ` Paolo Bonzini 2014-03-03 13:30 ` Kevin Wolf 2014-03-03 13:47 ` Paolo Bonzini 2014-03-03 14:04 ` Kevin Wolf 2014-03-03 14:55 ` Dr. David Alan Gilbert 2014-03-03 19:52 ` Andrea Arcangeli 2014-03-04 1:35 ` Huangpeng (Peter) 2014-03-05 14:46 ` Andrea Arcangeli 2014-03-05 1:52 ` Huangpeng (Peter) 2014-03-05 14:55 ` Andrea Arcangeli 2014-03-04 1:28 ` Huangpeng (Peter) 2014-03-04 9:40 ` Dr. David Alan Gilbert 2014-03-05 1:00 ` Huangpeng (Peter) 2014-03-05 9:09 ` Paolo Bonzini 2014-03-06 1:42 ` Huangpeng (Peter) 2014-03-06 9:14 ` Dr. David Alan Gilbert 2014-03-04 1:06 ` Huangpeng (Peter) 2014-03-03 13:18 ` Paolo Bonzini 2014-03-04 1:02 ` Huangpeng (Peter) 2014-03-04 8:54 ` Stefan Hajnoczi 2014-03-04 9:05 ` Paolo Bonzini 2014-03-04 11:28 ` Wenchao Xia 2014-03-05 0:46 ` Huangpeng (Peter)
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).