From: Hailiang Zhang <zhang.zhanghailiang@huawei.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: peter.huangpeng@huawei.com, qemu-devel@nongnu.org,
aarcange@redhat.com, quintela@redhat.com, amit.shah@redhat.com,
hanweidong@huawei.com
Subject: Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd
Date: Tue, 19 Jul 2016 14:53:53 +0800 [thread overview]
Message-ID: <578DCE81.1060407@huawei.com> (raw)
In-Reply-To: <20160714114342.GB2077@work-vm>
On 2016/7/14 19:43, Dr. David Alan Gilbert wrote:
> * Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
>> On 2016/7/14 2:02, Dr. David Alan Gilbert wrote:
>>> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>>>> For now, we still didn't support live memory snapshot, we have discussed
>>>> a scheme which based on userfaultfd long time ago.
>>>> You can find the discussion by the follow link:
>>>> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg01779.html
>>>>
>>>> The scheme is based on userfaultfd's write-protect capability.
>>>> The userfaultfd write protection feature is available here:
>>>> http://www.spinics.net/lists/linux-mm/msg97422.html
>>>
>>> I've (finally!) had a brief look through this, I like the idea.
>>> I've not bothered with minor cleanup like comments on them;
>>> I'm sure those will happen later; some larger scale things to think
>>> about are:
>>> a) I wonder if it's really best to put that much code into the postcopy
>>> function; it might be but I can see other userfault uses as well.
>>
>> Yes, it is better to extract common codes into public functions.
>>
>>> b) I worry a bit about the size of the copies you create during setup
>>> and I don't really understand why you can't start sending those pages
>>
>> Because we save device state and ram in the same snapshot_thread, if the process
>> of saving device is blocked by writing pages, we can remove the write-protect in
>> 'postcopy/fault' thread, but can't send it immediately.
>
> Don't you write the devices to a buffer? If so then you perhaps you could split
> writing into that buffer into a separate thread.
>
Hmm, it may work in this way.
>>> immediately - but then I worry aobut the relative order of when pages
>>> data should be sent compared to the state of devices view of RAM.
>>> c) Have you considered also using userfault for loading the snapshot - I
>>> know there was someone on #qemu a while ago who was talking about using
>>> it as a way to quickly reload from a migration image.
>>>
>>
>> I didn't notice such talking before, maybe i missed it.
>> Could you please send me the link ?
>
> I don't think there's any public docs about it; this was a conversation
> with Christoph Seifert on #qemu about May last year.
>
Got it.
>> But i do consider the scenario of quickly snapshot restoring.
>> And the difficulty here is how can we quickly find the position
>> of the special page. That is, while VM is accessing one page, we
>> need to find its position in snapshot file and read it into memory.
>> Consider the compatibility, we hope we can still re-use all migration
>> capabilities.
>>
>> My rough idea about the scenario is:
>> 1. Use an array to recode the beginning position of all VM's pages.
>> Use the offset as the index for the array, just like migration bitmaps.
>> 2. Save the data of the array into another file in a special format.
>> 3. Also record the position of device state data in snapshot file.
>> (Or we can put the device state data at the head of snapshot file)
>> 4. While restore the snapshot, reload the array first, and then read
>> the device state.
>> 5. Set all pages to MISS status.
>> 6. Resume VM to run
>> 7. The next process is like how postcopy incoming does.
>>
>> I'm not sure if this scenario is practicable or not. We need further
>> discussion. :)
>
> Yes; I can think of a few different ways to do (2):
> a) We could just store it at the end of the snapshot file (and know that
> it's at the end - I think the json format description did a similar trick).
Yes, this is a better idea.
> b) We wouldn't need the 4 byte headers on the page we currently send.
> c) Juan's idea of having multiple fd's for migration streams might also fit,
> with the RAM data in the separate file.
> d) But if we know it's a file (not a network stream) then should we treat it
> specially and just use a sparse file of the same size as RAM, and just
> pwrite() the data into the right offset?
>
Yes, this is the simplest way to save the snapshot file, the disadvantage for
it is we can't directly reuse current migration incoming way to restore VM (None
quickly restore). We need to modify current restore process. I'm not sure which
way is better. But it's worth a try.
Hailiang
> Dave
>
>>
>> Hailiang
>>
>>> Dave
>>>
>>>>
>>>> The process of this live memory scheme is like bellow:
>>>> 1. Pause VM
>>>> 2. Enable write-protect fault notification by using userfaultfd to
>>>> mark VM's memory to write-protect (readonly).
>>>> 3. Save VM's static state (here is device state) to snapshot file
>>>> 4. Resume VM, VM is going to run.
>>>> 5. Snapshot thread begins to save VM's live state (here is RAM) into
>>>> snapshot file.
>>>> 6. During this time, all the actions of writing VM's memory will be blocked
>>>> by kernel, and kernel will wakeup the fault treating thread in qemu to
>>>> process this write-protect fault. The fault treating thread will deliver this
>>>> page's address to snapshot thread.
>>>> 7. snapshot thread gets this address, save this page into snasphot file,
>>>> and then remove the write-protect by using userfaultfd API, after that,
>>>> the actions of writing will be recovered.
>>>> 8. Repeat step 5~7 until all VM's memory is saved to snapshot file
>>>>
>>>> Compared with the feature of 'migrate VM's state to file',
>>>> the main difference for live memory snapshot is it has little time delay for
>>>> catching VM's state. It just captures the VM's state while got users snapshot
>>>> command, just like take a photo of VM's state.
>>>>
>>>> For now, we only support tcg accelerator, since userfaultfd is not supporting
>>>> tracking write faults for KVM.
>>>>
>>>> Usage:
>>>> 1. Take a snapshot
>>>> #x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.5,accel=tcg,usb=off -drive file=/mnt/windows/win7_install.qcow2.bak,if=none,id=drive-ide0-0-1,format=qcow2,cache=none -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1 -vnc :7 -m 8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0 --monitor stdio
>>>> Issue snapshot command:
>>>> (qemu)migrate -d file:/home/Snapshot
>>>> 2. Revert to the snapshot
>>>> #x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.5,accel=tcg,usb=off -drive file=/mnt/windows/win7_install.qcow2.bak,if=none,id=drive-ide0-0-1,format=qcow2,cache=none -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1 -vnc :7 -m 8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0 --monitor stdio -incoming file:/home/Snapshot
>>>>
>>>> NOTE:
>>>> The userfaultfd write protection feature does not support THP for now,
>>>> Before taking snapshot, please disable THP by:
>>>> echo never > /sys/kernel/mm/transparent_hugepage/enabled
>>>>
>>>> TODO:
>>>> - Reduce the influence for VM while taking snapshot
>>>>
>>>> zhanghailiang (13):
>>>> postcopy/migration: Split fault related state into struct
>>>> UserfaultState
>>>> migration: Allow the migrate command to work on file: urls
>>>> migration: Allow -incoming to work on file: urls
>>>> migration: Create a snapshot thread to realize saving memory snapshot
>>>> migration: implement initialization work for snapshot
>>>> QEMUSizedBuffer: Introduce two help functions for qsb
>>>> savevm: Split qemu_savevm_state_complete_precopy() into two helper
>>>> functions
>>>> snapshot: Save VM's device state into snapshot file
>>>> migration/postcopy-ram: fix some helper functions to support
>>>> userfaultfd write-protect
>>>> snapshot: Enable the write-protect notification capability for VM's
>>>> RAM
>>>> snapshot/migration: Save VM's RAM into snapshot file
>>>> migration/ram: Fix some helper functions' parameter to use
>>>> PageSearchStatus
>>>> snapshot: Remove page's write-protect and copy the content during
>>>> setup stage
>>>>
>>>> include/migration/migration.h | 41 +++++--
>>>> include/migration/postcopy-ram.h | 9 +-
>>>> include/migration/qemu-file.h | 3 +-
>>>> include/qemu/typedefs.h | 1 +
>>>> include/sysemu/sysemu.h | 3 +
>>>> linux-headers/linux/userfaultfd.h | 21 +++-
>>>> migration/fd.c | 51 ++++++++-
>>>> migration/migration.c | 101 ++++++++++++++++-
>>>> migration/postcopy-ram.c | 229 ++++++++++++++++++++++++++++----------
>>>> migration/qemu-file-buf.c | 61 ++++++++++
>>>> migration/ram.c | 104 ++++++++++++-----
>>>> migration/savevm.c | 90 ++++++++++++---
>>>> trace-events | 1 +
>>>> 13 files changed, 587 insertions(+), 128 deletions(-)
>>>>
>>>> --
>>>> 1.8.3.1
>>>>
>>>>
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>
>>> .
>>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>
prev parent reply other threads:[~2016-07-19 6:54 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-07 12:19 [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd zhanghailiang
2016-01-07 12:19 ` [Qemu-devel] [RFC 01/13] postcopy/migration: Split fault related state into struct UserfaultState zhanghailiang
2016-01-07 12:19 ` [Qemu-devel] [RFC 02/13] migration: Allow the migrate command to work on file: urls zhanghailiang
2016-07-13 16:12 ` Dr. David Alan Gilbert
2016-07-14 5:27 ` Hailiang Zhang
2016-01-07 12:19 ` [Qemu-devel] [RFC 03/13] migration: Allow -incoming " zhanghailiang
2016-01-11 20:02 ` Dr. David Alan Gilbert
2016-01-12 13:04 ` Hailiang Zhang
2016-01-07 12:19 ` [Qemu-devel] [RFC 04/13] migration: Create a snapshot thread to realize saving memory snapshot zhanghailiang
2016-01-07 12:20 ` [Qemu-devel] [RFC 05/13] migration: implement initialization work for snapshot zhanghailiang
2016-01-07 12:20 ` [Qemu-devel] [RFC 06/13] QEMUSizedBuffer: Introduce two help functions for qsb zhanghailiang
2016-01-07 12:20 ` [Qemu-devel] [RFC 07/13] savevm: Split qemu_savevm_state_complete_precopy() into two helper functions zhanghailiang
2016-01-07 12:20 ` [Qemu-devel] [RFC 08/13] snapshot: Save VM's device state into snapshot file zhanghailiang
2016-01-07 12:20 ` [Qemu-devel] [RFC 09/13] migration/postcopy-ram: fix some helper functions to support userfaultfd write-protect zhanghailiang
2016-01-07 12:20 ` [Qemu-devel] [RFC 10/13] snapshot: Enable the write-protect notification capability for VM's RAM zhanghailiang
2016-01-07 12:20 ` [Qemu-devel] [RFC 11/13] snapshot/migration: Save VM's RAM into snapshot file zhanghailiang
2016-01-07 12:20 ` [Qemu-devel] [RFC 12/13] migration/ram: Fix some helper functions' parameter to use PageSearchStatus zhanghailiang
2016-01-11 17:55 ` Dr. David Alan Gilbert
2016-01-12 12:59 ` Hailiang Zhang
2016-01-07 12:20 ` [Qemu-devel] [RFC 13/13] snapshot: Remove page's write-protect and copy the content during setup stage zhanghailiang
2016-07-13 17:52 ` Dr. David Alan Gilbert
2016-07-14 8:02 ` Hailiang Zhang
2016-07-04 12:22 ` [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd Baptiste Reynal
2016-07-05 1:49 ` Hailiang Zhang
2016-07-05 9:57 ` Baptiste Reynal
2016-07-05 10:27 ` Hailiang Zhang
2016-08-18 15:56 ` Andrea Arcangeli
2016-08-20 6:31 ` Hailiang Zhang
2017-02-27 15:37 ` Christian Pinto
2017-02-28 1:48 ` Hailiang Zhang
2017-02-28 8:30 ` Christian Pinto
2017-02-28 16:14 ` Andrea Arcangeli
2017-03-01 1:08 ` Hailiang Zhang
2017-03-09 11:34 ` [Qemu-devel] [RFC PATCH 0/4] ARM/ARM64 fixes for live " Christian Pinto
2017-03-09 11:34 ` [Qemu-devel] [RFC PATCH 1/4] migration/postcopy-ram: check pagefault flags in userfaultfd thread Christian Pinto
2017-03-09 11:34 ` [Qemu-devel] [RFC PATCH 2/4] migration/ram: Fix for ARM/ARM64 page size Christian Pinto
2017-03-09 11:34 ` [Qemu-devel] [RFC PATCH 3/4] migration: snapshot thread Christian Pinto
2017-03-09 11:34 ` [Qemu-devel] [RFC PATCH 4/4] migration/postcopy-ram: ram_set_pages_wp fix Christian Pinto
2017-03-09 17:46 ` [Qemu-devel] [RFC PATCH 0/4] ARM/ARM64 fixes for live memory snapshot based on userfaultfd Dr. David Alan Gilbert
2017-03-10 8:15 ` Christian Pinto
2016-09-06 3:39 ` [Qemu-devel] [RFC 00/13] Live " Hailiang Zhang
2016-09-18 2:14 ` Hailiang Zhang
2016-12-08 12:45 ` Hailiang Zhang
2016-07-05 14:59 ` Andrea Arcangeli
2016-07-13 18:02 ` Dr. David Alan Gilbert
2016-07-14 10:24 ` Hailiang Zhang
2016-07-14 11:43 ` Dr. David Alan Gilbert
2016-07-19 6:53 ` Hailiang Zhang [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=578DCE81.1060407@huawei.com \
--to=zhang.zhanghailiang@huawei.com \
--cc=aarcange@redhat.com \
--cc=amit.shah@redhat.com \
--cc=dgilbert@redhat.com \
--cc=hanweidong@huawei.com \
--cc=peter.huangpeng@huawei.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.