* [Qemu-devel] Are there plans to achieve ram live Snapshot feature? @ 2013-08-09 10:20 Chijianchun 2013-08-09 15:38 ` Paolo Bonzini ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Chijianchun @ 2013-08-09 10:20 UTC (permalink / raw) To: aliguori@us.ibm.com, paul@codesourcery.com, kvm@vger.kernel.org, avi@redhat.com, mtosatti@redhat.com, qemu-devel@nongnu.org [-- Attachment #1: Type: text/plain, Size: 574 bytes --] Now in KVM, when RAM snapshot, vcpus needs stopped, it is Unfriendly restrictions to users. Are there plans to achieve ram live Snapshot feature? in my mind, Snapshots can not occupy additional too much memory, So when the memory needs to be changed, the old memory page is needed to flush to the file first. But flushing to file is too slower than memory, and when flushing, the vcpu or VM is need to be paused until finished flushing, so pause...resume...pause...resume............., more and more slower. Is this idea feasible? Are there any other thoughts? [-- Attachment #2: Type: text/html, Size: 3058 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Are there plans to achieve ram live Snapshot feature? 2013-08-09 10:20 [Qemu-devel] Are there plans to achieve ram live Snapshot feature? Chijianchun @ 2013-08-09 15:38 ` Paolo Bonzini 2013-08-09 15:45 ` Anthony Liguori 2013-08-12 9:59 ` Stefan Hajnoczi 2 siblings, 0 replies; 15+ messages in thread From: Paolo Bonzini @ 2013-08-09 15:38 UTC (permalink / raw) To: Chijianchun Cc: aliguori@us.ibm.com, kvm@vger.kernel.org, mtosatti@redhat.com, qemu-devel@nongnu.org, paul@codesourcery.com, avi@redhat.com Il 09/08/2013 12:20, Chijianchun ha scritto: > Now in KVM, when RAM snapshot, vcpus needs stopped, it is Unfriendly > restrictions to users. > > Are there plans to achieve ram live Snapshot feature? > > in my mind, Snapshots can not occupy additional too much memory, So when > the memory needs to be changed, the old memory page is needed to flush > to the file first. But flushing to file is too slower than memory, and > when flushing, the vcpu or VM is need to be paused until finished > flushing, so pause...resume...pause...resume............., more and > more slower. > > Is this idea feasible? Are there any other thoughts? > This looks very similar to postcopy migration (you can Google it). The infrastructure for postcopy migration could be used for this as well. Paolo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Are there plans to achieve ram live Snapshot feature? 2013-08-09 10:20 [Qemu-devel] Are there plans to achieve ram live Snapshot feature? Chijianchun 2013-08-09 15:38 ` Paolo Bonzini @ 2013-08-09 15:45 ` Anthony Liguori 2013-08-09 15:51 ` Eric Blake 2013-08-12 9:59 ` Stefan Hajnoczi 2 siblings, 1 reply; 15+ messages in thread From: Anthony Liguori @ 2013-08-09 15:45 UTC (permalink / raw) To: Chijianchun, paul@codesourcery.com, kvm@vger.kernel.org, avi@redhat.com, mtosatti@redhat.com, qemu-devel@nongnu.org Chijianchun <chijianchun@huawei.com> writes: > Now in KVM, when RAM snapshot, vcpus needs stopped, it is Unfriendly restrictions to users. > > Are there plans to achieve ram live Snapshot feature? I think you mean a live version of the savevm command. You can approximate live migrating to a file, creating an external disk snapshot, then resuming the guest. Regards, Anthony Liguori > > in my mind, Snapshots can not occupy additional too much memory, So when the memory needs to be changed, the old memory page is needed to flush to the file first. But flushing to file is too slower than memory, and when flushing, the vcpu or VM is need to be paused until finished flushing, so pause...resume...pause...resume............., more and more slower. > > Is this idea feasible? Are there any other thoughts? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Are there plans to achieve ram live Snapshot feature? 2013-08-09 15:45 ` Anthony Liguori @ 2013-08-09 15:51 ` Eric Blake 0 siblings, 0 replies; 15+ messages in thread From: Eric Blake @ 2013-08-09 15:51 UTC (permalink / raw) To: Anthony Liguori Cc: Chijianchun, mtosatti@redhat.com, paul@codesourcery.com, kvm@vger.kernel.org, qemu-devel@nongnu.org [-- Attachment #1: Type: text/plain, Size: 739 bytes --] On 08/09/2013 09:45 AM, Anthony Liguori wrote: > Chijianchun <chijianchun@huawei.com> writes: > >> Now in KVM, when RAM snapshot, vcpus needs stopped, it is Unfriendly restrictions to users. >> >> Are there plans to achieve ram live Snapshot feature? > > I think you mean a live version of the savevm command. > > You can approximate live migrating to a file, creating an external disk > snapshot, then resuming the guest. And libvirt does just that, since libvirt 1.0.5, for its external RAM snapshots. The vcpu pause is a mere fraction of a second, so it is generally not noticeable as any guest downtime. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 621 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Are there plans to achieve ram live Snapshot feature? 2013-08-09 10:20 [Qemu-devel] Are there plans to achieve ram live Snapshot feature? Chijianchun 2013-08-09 15:38 ` Paolo Bonzini 2013-08-09 15:45 ` Anthony Liguori @ 2013-08-12 9:59 ` Stefan Hajnoczi 2013-08-12 10:26 ` Alex Bligh 2 siblings, 1 reply; 15+ messages in thread From: Stefan Hajnoczi @ 2013-08-12 9:59 UTC (permalink / raw) To: Chijianchun Cc: aliguori@us.ibm.com, kvm@vger.kernel.org, mtosatti@redhat.com, qemu-devel@nongnu.org, paul@codesourcery.com, fred.konrad, xiawenc, avi@redhat.com On Fri, Aug 09, 2013 at 10:20:49AM +0000, Chijianchun wrote: > Now in KVM, when RAM snapshot, vcpus needs stopped, it is Unfriendly restrictions to users. > > Are there plans to achieve ram live Snapshot feature? > > in my mind, Snapshots can not occupy additional too much memory, So when the memory needs to be changed, the old memory page is needed to flush to the file first. But flushing to file is too slower than memory, and when flushing, the vcpu or VM is need to be paused until finished flushing, so pause...resume...pause...resume............., more and more slower. > > Is this idea feasible? Are there any other thoughts? A few people have looked at live vmsave or guest RAM snapshots. The idea that was discussed on qemu-devel@nongnu.org uses fork(2) to capture the state of guest RAM and then send it back to the parent process. The guest is only paused for a brief instant during fork(2) and can continue to run afterwards. The child process is a simple loop that sends the contents of guest RAM back to the parent process over a pipe or writes the memory pages to the save file on disk. It performs no logic besides writing out guest RAM. Stefan ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Are there plans to achieve ram live Snapshot feature? 2013-08-12 9:59 ` Stefan Hajnoczi @ 2013-08-12 10:26 ` Alex Bligh 2013-08-12 11:33 ` Stefan Hajnoczi 0 siblings, 1 reply; 15+ messages in thread From: Alex Bligh @ 2013-08-12 10:26 UTC (permalink / raw) To: Stefan Hajnoczi, Chijianchun Cc: aliguori, kvm, mtosatti, qemu-devel, avi, paul, Alex Bligh, xiawenc, fred.konrad --On 12 August 2013 11:59:03 +0200 Stefan Hajnoczi <stefanha@gmail.com> wrote: > The idea that was discussed on qemu-devel@nongnu.org uses fork(2) to > capture the state of guest RAM and then send it back to the parent > process. The guest is only paused for a brief instant during fork(2) > and can continue to run afterwards. How would you capture the state of emulated hardware which might not be in the guest RAM? -- Alex Bligh ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Are there plans to achieve ram live Snapshot feature? 2013-08-12 10:26 ` Alex Bligh @ 2013-08-12 11:33 ` Stefan Hajnoczi 2013-08-13 2:53 ` Wenchao Xia 0 siblings, 1 reply; 15+ messages in thread From: Stefan Hajnoczi @ 2013-08-12 11:33 UTC (permalink / raw) To: Alex Bligh Cc: Anthony Liguori, kvm, Marcelo Tosatti, qemu-devel, Chijianchun, Avi Kivity, Paul Brook, Wayne Xia, fred.konrad On Mon, Aug 12, 2013 at 12:26 PM, Alex Bligh <alex@alex.org.uk> wrote: > --On 12 August 2013 11:59:03 +0200 Stefan Hajnoczi <stefanha@gmail.com> > wrote: > >> The idea that was discussed on qemu-devel@nongnu.org uses fork(2) to >> capture the state of guest RAM and then send it back to the parent >> process. The guest is only paused for a brief instant during fork(2) >> and can continue to run afterwards. > > > How would you capture the state of emulated hardware which might not > be in the guest RAM? Exactly the same way vmsave works today. It calls the device's save functions which serialize state to file. The difference between today's vmsave and the fork(2) approach is that QEMU does not need to wait for guest RAM to be written to file before resuming the guest. Stefan ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Are there plans to achieve ram live Snapshot feature? 2013-08-12 11:33 ` Stefan Hajnoczi @ 2013-08-13 2:53 ` Wenchao Xia 2013-08-13 8:21 ` Stefan Hajnoczi 0 siblings, 1 reply; 15+ messages in thread From: Wenchao Xia @ 2013-08-13 2:53 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Anthony Liguori, kvm, Marcelo Tosatti, qemu-devel, Chijianchun, Avi Kivity, Alex Bligh, fred.konrad, Paul Brook 于 2013-8-12 19:33, Stefan Hajnoczi 写道: > On Mon, Aug 12, 2013 at 12:26 PM, Alex Bligh <alex@alex.org.uk> wrote: >> --On 12 August 2013 11:59:03 +0200 Stefan Hajnoczi <stefanha@gmail.com> >> wrote: >> >>> The idea that was discussed on qemu-devel@nongnu.org uses fork(2) to >>> capture the state of guest RAM and then send it back to the parent >>> process. The guest is only paused for a brief instant during fork(2) >>> and can continue to run afterwards. >> >> >> How would you capture the state of emulated hardware which might not >> be in the guest RAM? > > Exactly the same way vmsave works today. It calls the device's save > functions which serialize state to file. > > The difference between today's vmsave and the fork(2) approach is that > QEMU does not need to wait for guest RAM to be written to file before > resuming the guest. > > Stefan > I have a worry about what glib says: "On Unix, the GLib mainloop is incompatible with fork(). Any program using the mainloop must either exec() or exit() from the child without returning to the mainloop. " There is another way to do it: intercept the write in kvm.ko(or other kernel code). Since the key is intercept the memory change, we can do it in userspace in TCG mode, thus we can add the missing part in KVM mode. Another benefit of this way is: the used memory can be controlled. For example, with ioctl(), set a buffer of a fixed size which keeps the intercepted write data by kernel code, which can avoid frequently switch back to user space qemu code. when it is full always return back to userspace's qemu code, let qemu code save the data into disk. I haven't check the exactly behavior of Intel guest mode about how to handle page fault, so can't estimate the performance caused by switching of guest mode and root mode, but it should not be worse than fork(). -- Best Regards Wenchao Xia ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Are there plans to achieve ram live Snapshot feature? 2013-08-13 2:53 ` Wenchao Xia @ 2013-08-13 8:21 ` Stefan Hajnoczi 2013-08-14 1:54 ` Wenchao Xia 0 siblings, 1 reply; 15+ messages in thread From: Stefan Hajnoczi @ 2013-08-13 8:21 UTC (permalink / raw) To: Wenchao Xia Cc: Anthony Liguori, kvm, Marcelo Tosatti, qemu-devel, Chijianchun, Avi Kivity, Alex Bligh, fred.konrad, Paul Brook On Tue, Aug 13, 2013 at 4:53 AM, Wenchao Xia <xiawenc@linux.vnet.ibm.com> wrote: > 于 2013-8-12 19:33, Stefan Hajnoczi 写道: > >> On Mon, Aug 12, 2013 at 12:26 PM, Alex Bligh <alex@alex.org.uk> wrote: >>> >>> --On 12 August 2013 11:59:03 +0200 Stefan Hajnoczi <stefanha@gmail.com> >>> wrote: >>> >>>> The idea that was discussed on qemu-devel@nongnu.org uses fork(2) to >>>> capture the state of guest RAM and then send it back to the parent >>>> process. The guest is only paused for a brief instant during fork(2) >>>> and can continue to run afterwards. >>> >>> >>> >>> How would you capture the state of emulated hardware which might not >>> be in the guest RAM? >> >> >> Exactly the same way vmsave works today. It calls the device's save >> functions which serialize state to file. >> >> The difference between today's vmsave and the fork(2) approach is that >> QEMU does not need to wait for guest RAM to be written to file before >> resuming the guest. >> >> Stefan >> > I have a worry about what glib says: > > "On Unix, the GLib mainloop is incompatible with fork(). Any program > using the mainloop must either exec() or exit() from the child without > returning to the mainloop. " This is fine, the child just writes out the memory pages and exits. It never returns to the glib mainloop. > There is another way to do it: intercept the write in kvm.ko(or other > kernel code). Since the key is intercept the memory change, we can do > it in userspace in TCG mode, thus we can add the missing part in KVM > mode. Another benefit of this way is: the used memory can be > controlled. For example, with ioctl(), set a buffer of a fixed size > which keeps the intercepted write data by kernel code, which can avoid > frequently switch back to user space qemu code. when it is full always > return back to userspace's qemu code, let qemu code save the data into > disk. I haven't check the exactly behavior of Intel guest mode about > how to handle page fault, so can't estimate the performance caused by > switching of guest mode and root mode, but it should not be worse than > fork(). The fork(2) approach is portable, covers both KVM and TCG, and doesn't require kernel changes. A kvm.ko kernel change also won't be supported on existing KVM hosts. These are big drawbacks and the kernel approach would need to be significantly better than plain old fork(2) to make it worthwhile. Stefan ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Are there plans to achieve ram live Snapshot feature? 2013-08-13 8:21 ` Stefan Hajnoczi @ 2013-08-14 1:54 ` Wenchao Xia 2013-08-14 7:53 ` Stefan Hajnoczi 0 siblings, 1 reply; 15+ messages in thread From: Wenchao Xia @ 2013-08-14 1:54 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Anthony Liguori, kvm, Paul Brook, Marcelo Tosatti, qemu-devel, Chijianchun, Avi Kivity, Alex Bligh, fred.konrad 于 2013-8-13 16:21, Stefan Hajnoczi 写道: > On Tue, Aug 13, 2013 at 4:53 AM, Wenchao Xia <xiawenc@linux.vnet.ibm.com> wrote: >> 于 2013-8-12 19:33, Stefan Hajnoczi 写道: >> >>> On Mon, Aug 12, 2013 at 12:26 PM, Alex Bligh <alex@alex.org.uk> wrote: >>>> >>>> --On 12 August 2013 11:59:03 +0200 Stefan Hajnoczi <stefanha@gmail.com> >>>> wrote: >>>> >>>>> The idea that was discussed on qemu-devel@nongnu.org uses fork(2) to >>>>> capture the state of guest RAM and then send it back to the parent >>>>> process. The guest is only paused for a brief instant during fork(2) >>>>> and can continue to run afterwards. >>>> >>>> >>>> >>>> How would you capture the state of emulated hardware which might not >>>> be in the guest RAM? >>> >>> >>> Exactly the same way vmsave works today. It calls the device's save >>> functions which serialize state to file. >>> >>> The difference between today's vmsave and the fork(2) approach is that >>> QEMU does not need to wait for guest RAM to be written to file before >>> resuming the guest. >>> >>> Stefan >>> >> I have a worry about what glib says: >> >> "On Unix, the GLib mainloop is incompatible with fork(). Any program >> using the mainloop must either exec() or exit() from the child without >> returning to the mainloop. " > > This is fine, the child just writes out the memory pages and exits. > It never returns to the glib mainloop. > >> There is another way to do it: intercept the write in kvm.ko(or other >> kernel code). Since the key is intercept the memory change, we can do >> it in userspace in TCG mode, thus we can add the missing part in KVM >> mode. Another benefit of this way is: the used memory can be >> controlled. For example, with ioctl(), set a buffer of a fixed size >> which keeps the intercepted write data by kernel code, which can avoid >> frequently switch back to user space qemu code. when it is full always >> return back to userspace's qemu code, let qemu code save the data into >> disk. I haven't check the exactly behavior of Intel guest mode about >> how to handle page fault, so can't estimate the performance caused by >> switching of guest mode and root mode, but it should not be worse than >> fork(). > > The fork(2) approach is portable, covers both KVM and TCG, and doesn't > require kernel changes. A kvm.ko kernel change also won't be > supported on existing KVM hosts. These are big drawbacks and the > kernel approach would need to be significantly better than plain old > fork(2) to make it worthwhile. > > Stefan > I think advantage is memory usage is predictable, so memory usage peak can be avoided, by always save the changed pages first. fork() does not know which pages are changed. I am not sure if this would be a serious issue when server's memory is consumed much, for example, 24G host emulate 11G*2 guest to provide powerful virtual server. -- Best Regards Wenchao Xia ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Are there plans to achieve ram live Snapshot feature? 2013-08-14 1:54 ` Wenchao Xia @ 2013-08-14 7:53 ` Stefan Hajnoczi 2013-08-14 8:13 ` Alex Bligh 2013-08-15 2:26 ` Wenchao Xia 0 siblings, 2 replies; 15+ messages in thread From: Stefan Hajnoczi @ 2013-08-14 7:53 UTC (permalink / raw) To: Wenchao Xia Cc: Anthony Liguori, kvm, Paul Brook, Marcelo Tosatti, qemu-devel, Chijianchun, Avi Kivity, Alex Bligh, fred.konrad On Wed, Aug 14, 2013 at 3:54 AM, Wenchao Xia <xiawenc@linux.vnet.ibm.com> wrote: > 于 2013-8-13 16:21, Stefan Hajnoczi 写道: > >> On Tue, Aug 13, 2013 at 4:53 AM, Wenchao Xia <xiawenc@linux.vnet.ibm.com> >> wrote: >>> >>> 于 2013-8-12 19:33, Stefan Hajnoczi 写道: >>> >>>> On Mon, Aug 12, 2013 at 12:26 PM, Alex Bligh <alex@alex.org.uk> wrote: >>>>> >>>>> >>>>> --On 12 August 2013 11:59:03 +0200 Stefan Hajnoczi <stefanha@gmail.com> >>>>> wrote: >>>>> >>>>>> The idea that was discussed on qemu-devel@nongnu.org uses fork(2) to >>>>>> capture the state of guest RAM and then send it back to the parent >>>>>> process. The guest is only paused for a brief instant during fork(2) >>>>>> and can continue to run afterwards. >>>>> >>>>> >>>>> >>>>> >>>>> How would you capture the state of emulated hardware which might not >>>>> be in the guest RAM? >>>> >>>> >>>> >>>> Exactly the same way vmsave works today. It calls the device's save >>>> functions which serialize state to file. >>>> >>>> The difference between today's vmsave and the fork(2) approach is that >>>> QEMU does not need to wait for guest RAM to be written to file before >>>> resuming the guest. >>>> >>>> Stefan >>>> >>> I have a worry about what glib says: >>> >>> "On Unix, the GLib mainloop is incompatible with fork(). Any program >>> using the mainloop must either exec() or exit() from the child without >>> returning to the mainloop. " >> >> >> This is fine, the child just writes out the memory pages and exits. >> It never returns to the glib mainloop. >> >>> There is another way to do it: intercept the write in kvm.ko(or other >>> kernel code). Since the key is intercept the memory change, we can do >>> it in userspace in TCG mode, thus we can add the missing part in KVM >>> mode. Another benefit of this way is: the used memory can be >>> controlled. For example, with ioctl(), set a buffer of a fixed size >>> which keeps the intercepted write data by kernel code, which can avoid >>> frequently switch back to user space qemu code. when it is full always >>> return back to userspace's qemu code, let qemu code save the data into >>> disk. I haven't check the exactly behavior of Intel guest mode about >>> how to handle page fault, so can't estimate the performance caused by >>> switching of guest mode and root mode, but it should not be worse than >>> fork(). >> >> >> The fork(2) approach is portable, covers both KVM and TCG, and doesn't >> require kernel changes. A kvm.ko kernel change also won't be >> supported on existing KVM hosts. These are big drawbacks and the >> kernel approach would need to be significantly better than plain old >> fork(2) to make it worthwhile. >> >> Stefan >> > I think advantage is memory usage is predictable, so memory usage > peak can be avoided, by always save the changed pages first. fork() > does not know which pages are changed. I am not sure if this would > be a serious issue when server's memory is consumed much, for example, > 24G host emulate 11G*2 guest to provide powerful virtual server. Memory usage is predictable but guest uptime is unpredictable because it waits until memory is written out. This defeats the point of "live" savevm. The guest may be stalled arbitrarily. The fork child can minimize the chance of out-of-memory by using madvise(MADV_DONTNEED) after pages have been written out. The way fork handles memory overcommit on Linux is configurable, but I guess in a situation where memory runs out the Out-of-Memory Killer will kill a process (probably QEMU since it is hogging so much memory). The risk of OOM can be avoided by running the traditional vmsave which stops the guest instead of using "live" vmsave. The other option is to live migrate to file but the disadvantage there is that you cannot choose exactly when the state it saved, it happens sometime after live migration is initiated. There are trade-offs with all the approaches, it depends on what is most important to you. Stefan ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Are there plans to achieve ram live Snapshot feature? 2013-08-14 7:53 ` Stefan Hajnoczi @ 2013-08-14 8:13 ` Alex Bligh 2013-08-15 2:26 ` Wenchao Xia 1 sibling, 0 replies; 15+ messages in thread From: Alex Bligh @ 2013-08-14 8:13 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Anthony Liguori, kvm, Marcelo Tosatti, qemu-devel, Chijianchun, Paul Brook, Alex Bligh, fred.konrad, Wenchao Xia, Avi Kivity On 14 Aug 2013, at 08:53, Stefan Hajnoczi wrote: > The fork child can minimize the chance of out-of-memory by using > madvise(MADV_DONTNEED) after pages have been written out. This may also be helpful (last clause) before starting writing. MADV_SEQUENTIAL Expect page references in sequential order. (Hence, pages in the given range can be aggressively read ahead, and may be freed soon after they are accessed.) -- Alex Bligh ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Are there plans to achieve ram live Snapshot feature? 2013-08-14 7:53 ` Stefan Hajnoczi 2013-08-14 8:13 ` Alex Bligh @ 2013-08-15 2:26 ` Wenchao Xia 2013-08-15 7:49 ` Stefan Hajnoczi 1 sibling, 1 reply; 15+ messages in thread From: Wenchao Xia @ 2013-08-15 2:26 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Anthony Liguori, kvm, Marcelo Tosatti, qemu-devel, Chijianchun, Paul Brook, Alex Bligh, fred.konrad, Avi Kivity 于 2013-8-14 15:53, Stefan Hajnoczi 写道: > On Wed, Aug 14, 2013 at 3:54 AM, Wenchao Xia <xiawenc@linux.vnet.ibm.com> wrote: >> 于 2013-8-13 16:21, Stefan Hajnoczi 写道: >> >>> On Tue, Aug 13, 2013 at 4:53 AM, Wenchao Xia <xiawenc@linux.vnet.ibm.com> >>> wrote: >>>> >>>> 于 2013-8-12 19:33, Stefan Hajnoczi 写道: >>>> >>>>> On Mon, Aug 12, 2013 at 12:26 PM, Alex Bligh <alex@alex.org.uk> wrote: >>>>>> >>>>>> >>>>>> --On 12 August 2013 11:59:03 +0200 Stefan Hajnoczi <stefanha@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> The idea that was discussed on qemu-devel@nongnu.org uses fork(2) to >>>>>>> capture the state of guest RAM and then send it back to the parent >>>>>>> process. The guest is only paused for a brief instant during fork(2) >>>>>>> and can continue to run afterwards. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> How would you capture the state of emulated hardware which might not >>>>>> be in the guest RAM? >>>>> >>>>> >>>>> >>>>> Exactly the same way vmsave works today. It calls the device's save >>>>> functions which serialize state to file. >>>>> >>>>> The difference between today's vmsave and the fork(2) approach is that >>>>> QEMU does not need to wait for guest RAM to be written to file before >>>>> resuming the guest. >>>>> >>>>> Stefan >>>>> >>>> I have a worry about what glib says: >>>> >>>> "On Unix, the GLib mainloop is incompatible with fork(). Any program >>>> using the mainloop must either exec() or exit() from the child without >>>> returning to the mainloop. " >>> >>> >>> This is fine, the child just writes out the memory pages and exits. >>> It never returns to the glib mainloop. >>> >>>> There is another way to do it: intercept the write in kvm.ko(or other >>>> kernel code). Since the key is intercept the memory change, we can do >>>> it in userspace in TCG mode, thus we can add the missing part in KVM >>>> mode. Another benefit of this way is: the used memory can be >>>> controlled. For example, with ioctl(), set a buffer of a fixed size >>>> which keeps the intercepted write data by kernel code, which can avoid >>>> frequently switch back to user space qemu code. when it is full always >>>> return back to userspace's qemu code, let qemu code save the data into >>>> disk. I haven't check the exactly behavior of Intel guest mode about >>>> how to handle page fault, so can't estimate the performance caused by >>>> switching of guest mode and root mode, but it should not be worse than >>>> fork(). >>> >>> >>> The fork(2) approach is portable, covers both KVM and TCG, and doesn't >>> require kernel changes. A kvm.ko kernel change also won't be >>> supported on existing KVM hosts. These are big drawbacks and the >>> kernel approach would need to be significantly better than plain old >>> fork(2) to make it worthwhile. >>> >>> Stefan >>> >> I think advantage is memory usage is predictable, so memory usage >> peak can be avoided, by always save the changed pages first. fork() >> does not know which pages are changed. I am not sure if this would >> be a serious issue when server's memory is consumed much, for example, >> 24G host emulate 11G*2 guest to provide powerful virtual server. > > Memory usage is predictable but guest uptime is unpredictable because > it waits until memory is written out. This defeats the point of > "live" savevm. The guest may be stalled arbitrarily. > I think it is adjustable. There is no much difference with fork(), except get more precise control about the changed pages. Kernel intercept the change, and stores the changed page in another page, similar to fork(). When userspace qemu code execute, save some pages to disk. Buffer can be used like some lubricant. When Buffer = MAX, it equals to fork(), guest runs more lively. When Buffer = 0, guest runs less lively. I think it allows user to find a good balance point with a parameter. It is harder to implement, just want to show the idea. > The fork child can minimize the chance of out-of-memory by using > madvise(MADV_DONTNEED) after pages have been written out. It seems no way to make sure the written out page is the changed pages, so it have a good chance the written one is the unchanged and still used by the other qemu process. > > The way fork handles memory overcommit on Linux is configurable, but I > guess in a situation where memory runs out the Out-of-Memory Killer > will kill a process (probably QEMU since it is hogging so much > memory). > > The risk of OOM can be avoided by running the traditional vmsave which > stops the guest instead of using "live" vmsave. > > The other option is to live migrate to file but the disadvantage there > is that you cannot choose exactly when the state it saved, it happens > sometime after live migration is initiated. > > There are trade-offs with all the approaches, it depends on what is > most important to you. > > Stefan > -- Best Regards Wenchao Xia ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Are there plans to achieve ram live Snapshot feature? 2013-08-15 2:26 ` Wenchao Xia @ 2013-08-15 7:49 ` Stefan Hajnoczi 2013-08-15 8:03 ` Wenchao Xia 0 siblings, 1 reply; 15+ messages in thread From: Stefan Hajnoczi @ 2013-08-15 7:49 UTC (permalink / raw) To: Wenchao Xia Cc: Anthony Liguori, kvm, Marcelo Tosatti, qemu-devel, Chijianchun, Paul Brook, Alex Bligh, fred.konrad, Avi Kivity On Thu, Aug 15, 2013 at 10:26:36AM +0800, Wenchao Xia wrote: > 于 2013-8-14 15:53, Stefan Hajnoczi 写道: > > On Wed, Aug 14, 2013 at 3:54 AM, Wenchao Xia <xiawenc@linux.vnet.ibm.com> wrote: > >> 于 2013-8-13 16:21, Stefan Hajnoczi 写道: > >> > >>> On Tue, Aug 13, 2013 at 4:53 AM, Wenchao Xia <xiawenc@linux.vnet.ibm.com> > >>> wrote: > >>>> > >>>> 于 2013-8-12 19:33, Stefan Hajnoczi 写道: > >>>> > >>>>> On Mon, Aug 12, 2013 at 12:26 PM, Alex Bligh <alex@alex.org.uk> wrote: > >>>>>> > >>>>>> > >>>>>> --On 12 August 2013 11:59:03 +0200 Stefan Hajnoczi <stefanha@gmail.com> > >>>>>> wrote: > >>>>>> > >>>>>>> The idea that was discussed on qemu-devel@nongnu.org uses fork(2) to > >>>>>>> capture the state of guest RAM and then send it back to the parent > >>>>>>> process. The guest is only paused for a brief instant during fork(2) > >>>>>>> and can continue to run afterwards. > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> How would you capture the state of emulated hardware which might not > >>>>>> be in the guest RAM? > >>>>> > >>>>> > >>>>> > >>>>> Exactly the same way vmsave works today. It calls the device's save > >>>>> functions which serialize state to file. > >>>>> > >>>>> The difference between today's vmsave and the fork(2) approach is that > >>>>> QEMU does not need to wait for guest RAM to be written to file before > >>>>> resuming the guest. > >>>>> > >>>>> Stefan > >>>>> > >>>> I have a worry about what glib says: > >>>> > >>>> "On Unix, the GLib mainloop is incompatible with fork(). Any program > >>>> using the mainloop must either exec() or exit() from the child without > >>>> returning to the mainloop. " > >>> > >>> > >>> This is fine, the child just writes out the memory pages and exits. > >>> It never returns to the glib mainloop. > >>> > >>>> There is another way to do it: intercept the write in kvm.ko(or other > >>>> kernel code). Since the key is intercept the memory change, we can do > >>>> it in userspace in TCG mode, thus we can add the missing part in KVM > >>>> mode. Another benefit of this way is: the used memory can be > >>>> controlled. For example, with ioctl(), set a buffer of a fixed size > >>>> which keeps the intercepted write data by kernel code, which can avoid > >>>> frequently switch back to user space qemu code. when it is full always > >>>> return back to userspace's qemu code, let qemu code save the data into > >>>> disk. I haven't check the exactly behavior of Intel guest mode about > >>>> how to handle page fault, so can't estimate the performance caused by > >>>> switching of guest mode and root mode, but it should not be worse than > >>>> fork(). > >>> > >>> > >>> The fork(2) approach is portable, covers both KVM and TCG, and doesn't > >>> require kernel changes. A kvm.ko kernel change also won't be > >>> supported on existing KVM hosts. These are big drawbacks and the > >>> kernel approach would need to be significantly better than plain old > >>> fork(2) to make it worthwhile. > >>> > >>> Stefan > >>> > >> I think advantage is memory usage is predictable, so memory usage > >> peak can be avoided, by always save the changed pages first. fork() > >> does not know which pages are changed. I am not sure if this would > >> be a serious issue when server's memory is consumed much, for example, > >> 24G host emulate 11G*2 guest to provide powerful virtual server. > > > > Memory usage is predictable but guest uptime is unpredictable because > > it waits until memory is written out. This defeats the point of > > "live" savevm. The guest may be stalled arbitrarily. > > > I think it is adjustable. There is no much difference with > fork(), except get more precise control about the changed pages. > Kernel intercept the change, and stores the changed page in another > page, similar to fork(). When userspace qemu code execute, save some > pages to disk. Buffer can be used like some lubricant. When Buffer = > MAX, it equals to fork(), guest runs more lively. When Buffer = 0, > guest runs less lively. I think it allows user to find a good balance > point with a parameter. > It is harder to implement, just want to show the idea. You are right. You could set a bigger buffer size to increase guest uptime. > > The fork child can minimize the chance of out-of-memory by using > > madvise(MADV_DONTNEED) after pages have been written out. > It seems no way to make sure the written out page is the changed > pages, so it have a good chance the written one is the unchanged and > still used by the other qemu process. The KVM dirty log tells you which pages were touched. The fork child process could give priority to the pages which have been touched by the guest. They must be written out and marked madvise(MADV_DONTNEED) as soon as possible. I haven't looked at the vmsave data format yet to see if memory pages can be saved in random order, but this might work. It reduces the likelihood of copy-on-write memory growth. Stefan ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Are there plans to achieve ram live Snapshot feature? 2013-08-15 7:49 ` Stefan Hajnoczi @ 2013-08-15 8:03 ` Wenchao Xia 0 siblings, 0 replies; 15+ messages in thread From: Wenchao Xia @ 2013-08-15 8:03 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Anthony Liguori, kvm, Marcelo Tosatti, qemu-devel, Chijianchun, Paul Brook, Alex Bligh, fred.konrad, Avi Kivity 于 2013-8-15 15:49, Stefan Hajnoczi 写道: > On Thu, Aug 15, 2013 at 10:26:36AM +0800, Wenchao Xia wrote: >> 于 2013-8-14 15:53, Stefan Hajnoczi 写道: >>> On Wed, Aug 14, 2013 at 3:54 AM, Wenchao Xia <xiawenc@linux.vnet.ibm.com> wrote: >>>> 于 2013-8-13 16:21, Stefan Hajnoczi 写道: >>>> >>>>> On Tue, Aug 13, 2013 at 4:53 AM, Wenchao Xia <xiawenc@linux.vnet.ibm.com> >>>>> wrote: >>>>>> >>>>>> 于 2013-8-12 19:33, Stefan Hajnoczi 写道: >>>>>> >>>>>>> On Mon, Aug 12, 2013 at 12:26 PM, Alex Bligh <alex@alex.org.uk> wrote: >>>>>>>> >>>>>>>> >>>>>>>> --On 12 August 2013 11:59:03 +0200 Stefan Hajnoczi <stefanha@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> The idea that was discussed on qemu-devel@nongnu.org uses fork(2) to >>>>>>>>> capture the state of guest RAM and then send it back to the parent >>>>>>>>> process. The guest is only paused for a brief instant during fork(2) >>>>>>>>> and can continue to run afterwards. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> How would you capture the state of emulated hardware which might not >>>>>>>> be in the guest RAM? >>>>>>> >>>>>>> >>>>>>> >>>>>>> Exactly the same way vmsave works today. It calls the device's save >>>>>>> functions which serialize state to file. >>>>>>> >>>>>>> The difference between today's vmsave and the fork(2) approach is that >>>>>>> QEMU does not need to wait for guest RAM to be written to file before >>>>>>> resuming the guest. >>>>>>> >>>>>>> Stefan >>>>>>> >>>>>> I have a worry about what glib says: >>>>>> >>>>>> "On Unix, the GLib mainloop is incompatible with fork(). Any program >>>>>> using the mainloop must either exec() or exit() from the child without >>>>>> returning to the mainloop. " >>>>> >>>>> >>>>> This is fine, the child just writes out the memory pages and exits. >>>>> It never returns to the glib mainloop. >>>>> >>>>>> There is another way to do it: intercept the write in kvm.ko(or other >>>>>> kernel code). Since the key is intercept the memory change, we can do >>>>>> it in userspace in TCG mode, thus we can add the missing part in KVM >>>>>> mode. Another benefit of this way is: the used memory can be >>>>>> controlled. For example, with ioctl(), set a buffer of a fixed size >>>>>> which keeps the intercepted write data by kernel code, which can avoid >>>>>> frequently switch back to user space qemu code. when it is full always >>>>>> return back to userspace's qemu code, let qemu code save the data into >>>>>> disk. I haven't check the exactly behavior of Intel guest mode about >>>>>> how to handle page fault, so can't estimate the performance caused by >>>>>> switching of guest mode and root mode, but it should not be worse than >>>>>> fork(). >>>>> >>>>> >>>>> The fork(2) approach is portable, covers both KVM and TCG, and doesn't >>>>> require kernel changes. A kvm.ko kernel change also won't be >>>>> supported on existing KVM hosts. These are big drawbacks and the >>>>> kernel approach would need to be significantly better than plain old >>>>> fork(2) to make it worthwhile. >>>>> >>>>> Stefan >>>>> >>>> I think advantage is memory usage is predictable, so memory usage >>>> peak can be avoided, by always save the changed pages first. fork() >>>> does not know which pages are changed. I am not sure if this would >>>> be a serious issue when server's memory is consumed much, for example, >>>> 24G host emulate 11G*2 guest to provide powerful virtual server. >>> >>> Memory usage is predictable but guest uptime is unpredictable because >>> it waits until memory is written out. This defeats the point of >>> "live" savevm. The guest may be stalled arbitrarily. >>> >> I think it is adjustable. There is no much difference with >> fork(), except get more precise control about the changed pages. >> Kernel intercept the change, and stores the changed page in another >> page, similar to fork(). When userspace qemu code execute, save some >> pages to disk. Buffer can be used like some lubricant. When Buffer = >> MAX, it equals to fork(), guest runs more lively. When Buffer = 0, >> guest runs less lively. I think it allows user to find a good balance >> point with a parameter. >> It is harder to implement, just want to show the idea. > > You are right. You could set a bigger buffer size to increase guest > uptime. > >>> The fork child can minimize the chance of out-of-memory by using >>> madvise(MADV_DONTNEED) after pages have been written out. >> It seems no way to make sure the written out page is the changed >> pages, so it have a good chance the written one is the unchanged and >> still used by the other qemu process. > > The KVM dirty log tells you which pages were touched. The fork child > process could give priority to the pages which have been touched by the > guest. They must be written out and marked madvise(MADV_DONTNEED) as > soon as possible. Hmm, if dirty log still works normal in child process to reflect the memory status in parent not child's, then the problem could be solved by: when dirty pages is too much, child tell parent to wait some time. But I haven't check if kvm.ko behaviors like that. > > I haven't looked at the vmsave data format yet to see if memory pages > can be saved in random order, but this might work. It reduces the > likelihood of copy-on-write memory growth. > > Stefan > -- Best Regards Wenchao Xia ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2013-08-15 8:04 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-08-09 10:20 [Qemu-devel] Are there plans to achieve ram live Snapshot feature? Chijianchun 2013-08-09 15:38 ` Paolo Bonzini 2013-08-09 15:45 ` Anthony Liguori 2013-08-09 15:51 ` Eric Blake 2013-08-12 9:59 ` Stefan Hajnoczi 2013-08-12 10:26 ` Alex Bligh 2013-08-12 11:33 ` Stefan Hajnoczi 2013-08-13 2:53 ` Wenchao Xia 2013-08-13 8:21 ` Stefan Hajnoczi 2013-08-14 1:54 ` Wenchao Xia 2013-08-14 7:53 ` Stefan Hajnoczi 2013-08-14 8:13 ` Alex Bligh 2013-08-15 2:26 ` Wenchao Xia 2013-08-15 7:49 ` Stefan Hajnoczi 2013-08-15 8:03 ` Wenchao Xia
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).