From: Hailiang Zhang <zhang.zhanghailiang@huawei.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: peter.huangpeng@huawei.com,
Baptiste Reynal <b.reynal@virtualopensystems.com>,
qemu list <qemu-devel@nongnu.org>,
hanweidong@huawei.com, Juan Quintela <quintela@redhat.com>,
dgilbert@redhat.com, Amit Shah <amit.shah@redhat.com>,
Christian Pinto <c.pinto@virtualopensystems.com>
Subject: Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd
Date: Sun, 18 Sep 2016 10:14:20 +0800 [thread overview]
Message-ID: <57DDF87C.1070506@huawei.com> (raw)
In-Reply-To: <57CE3A7D.3030404@huawei.com>
Hi Andrea,
Any comments ?
Thanks.
On 2016/9/6 11:39, Hailiang Zhang wrote:
> Hi Andrea,
>
> I tested it with the new live memory snapshot with --enable-kvm, it doesn't work.
>
> To make things simple, I simplified the codes, only left the codes that can tested
> the write-protect capability. You can find the codes from
> https://github.com/coloft/qemu/tree/test-userfault-write-protect.
> You can reproduce the problem easily with it.
>
> Tested result as follow,
> [root@localhost qemu]# x86_64-softmmu/qemu-system-x86_64 --enable-kvm -drive file=/mnt/sdb/win7/win7.qcow2,if=none,id=drive-ide0-0-1,format=qcow2,cache=none -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1 -vnc :7 -m 8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0 --monitor stdio
> QEMU 2.6.95 monitor - type 'help' for more information
> (qemu) migrate file:/home/xxx
> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
> qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
> error: kvm run failed Bad address
> EAX=00000004 EBX=00000000 ECX=83b2ac20 EDX=0000c022
> ESI=85fe33f4 EDI=0000c020 EBP=83b2abcc ESP=83b2abc0
> EIP=8bd2ff0c EFL=00010293 [--S-A-C] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0023 00000000 ffffffff 00c0f300 DPL=3 DS [-WA]
> CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
> SS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
> DS =0023 00000000 ffffffff 00c0f300 DPL=3 DS [-WA]
> FS =0030 83b2dc00 00003748 00409300 DPL=0 DS [-WA]
> GS =0000 00000000 ffffffff 00000000
> LDT=0000 00000000 ffffffff 00000000
> TR =0028 801e2000 000020ab 00008b00 DPL=0 TSS32-busy
> GDT= 80b95000 000003ff
> IDT= 80b95400 000007ff
> CR0=8001003b CR2=030b5000 CR3=00185000 CR4=000006f8
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000800
> Code=8b ff 55 8b ec 53 56 8b 75 08 57 8b 7e 34 56 e8 30 f7 ff ff <6a> 00 57 8a d8 e8 96 14 00 00 6a 04 83 c7 02 57 e8 8b 14 00 00 5f c6 46 5b 00 5e 8a c3 5b
>
> I investigated kvm and userfault codes. we use MMU Notifier to integrating KVM with the Linux
> Memory Management.
>
> Here for userfault write-protect, the function calling paths are:
> userfaultfd_ioctl
> -> userfaultfd_writeprotect
> -> mwriteprotect_range
> -> change_protection (Directly call mprotect helper here)
> -> change_protection_range
> -> change_pud_range
> -> change_pmd_range
> -> mmu_notifier_invalidate_range_start(mm, mni_start, end);
> -> kvm_mmu_notifier_invalidate_range_start (KVM module)
> OK, here, we remove the item from spte. (If we use EPT hardware, we remove
> the page table entry for it).
> That's why we can get fault notifying for VM.
> And It seems that we can't fix the userfault (remove the page's write-protect authority)
> by this function calling paths.
>
> Here my question is, for userfault write-protect capability, why we remove the page table
> entry instead of marking it as read-only.
> Actually, for KVM, we have a mmu notifier (kvm_mmu_notifier_change_pte) to do this,
> We can use it to remove the writable authority for KVM page table, just like KVM dirty log tracking
> does. Please see function __rmap_write_protect() in KVM.
>
> Another question, is mprotect() works normally with KVM ? (I didn't test it.), I think
> KSM and swap can work with KVM properly.
>
> Besides, there seems to be a bug for userfault write-protect.
> We use UFFDIO_COPY_MODE_DONTWAKE in userfaultfd_writeprotect, should it be
> UFFDIO_WRITEPROTECT_MODE_DONTWAKE there ?
>
> static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx,
> unsigned long arg)
> {
> ... ...
>
> if (!(uffdio_wp.mode & UFFDIO_COPY_MODE_DONTWAKE)) {
> range.start = uffdio_wp.range.start;
> range.len = uffdio_wp.range.len;
> wake_userfault(ctx, &range);
> }
> return ret;
> }
>
> Thanks.
> Hailiang
>
> On 2016/8/18 23:56, Andrea Arcangeli wrote:
>> Hello everyone,
>>
>> I've an aa.git tree uptodate on the master & userfault branch (master
>> includes other pending VM stuff, userfault branch only contains
>> userfault enhancements):
>>
>> https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/log/?h=userfault
>>
>> I didn't have time to test KVM live memory snapshot on it yet as I'm
>> still working to improve it. Did anybody test it? However I'd be happy
>> to take any bugreports and quickly solve anything that isn't working
>> right with the shadow MMU.
>>
>> I got positive report already for another usage of the uffd WP support:
>>
>> https://medium.com/@MartinCracauer/generational-garbage-collection-write-barriers-write-protection-and-userfaultfd-2-8b0e796b8f7f
>>
>> The last few things I'm working on to finish the WP support are:
>>
>> 1) pte_swp_mkuffd_wp equivalent of pte_swp_mksoft_dirty to mark in a
>> vma->vm_flags with VM_UFFD_WP set, which swap entries were
>> generated while the pte was wrprotected.
>>
>> 2) to avoid all false positives the equivalent of pte_mksoft_dirty is
>> needed too... and that requires spare software bits on the pte
>> which are available on x86. I considered also taking over the
>> soft_dirty bit but then you couldn't do checkpoint restore of a
>> JIT/to-native compiler that uses uffd WP support so it wasn't
>> ideal. Perhaps it would be ok as an incremental patch to make the
>> two options mutually exclusive to defer the arch changes that
>> pte_mkuffd_wp would require for later.
>>
>> 3) prevent UFFDIO_ZEROPAGE if registering WP|MISSING or trigger a
>> cow in userfaultfd_writeprotect.
>>
>> 4) WP selftest
>>
>> In theory things should work ok already if the userland code is
>> tolerant against false positives through swap and after fork() and
>> KSM. For an usage like snapshotting false positives shouldn't be an
>> issue (it'll just run slower if you swap in the worst case), and point
>> 3) above also isn't an issue because it's going to register into uffd
>> with WP only.
>>
>> The current status includes:
>>
>> 1) WP support for anon (with false positives.. work in progress)
>>
>> 2) MISSING support for tmpfs and hugetlbfs
>>
>> 3) non cooperative support
>>
>> Thanks,
>> Andrea
>>
>> .
>>
next prev parent reply other threads:[~2016-09-18 2:18 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-07 12:19 [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd zhanghailiang
2016-01-07 12:19 ` [Qemu-devel] [RFC 01/13] postcopy/migration: Split fault related state into struct UserfaultState zhanghailiang
2016-01-07 12:19 ` [Qemu-devel] [RFC 02/13] migration: Allow the migrate command to work on file: urls zhanghailiang
2016-07-13 16:12 ` Dr. David Alan Gilbert
2016-07-14 5:27 ` Hailiang Zhang
2016-01-07 12:19 ` [Qemu-devel] [RFC 03/13] migration: Allow -incoming " zhanghailiang
2016-01-11 20:02 ` Dr. David Alan Gilbert
2016-01-12 13:04 ` Hailiang Zhang
2016-01-07 12:19 ` [Qemu-devel] [RFC 04/13] migration: Create a snapshot thread to realize saving memory snapshot zhanghailiang
2016-01-07 12:20 ` [Qemu-devel] [RFC 05/13] migration: implement initialization work for snapshot zhanghailiang
2016-01-07 12:20 ` [Qemu-devel] [RFC 06/13] QEMUSizedBuffer: Introduce two help functions for qsb zhanghailiang
2016-01-07 12:20 ` [Qemu-devel] [RFC 07/13] savevm: Split qemu_savevm_state_complete_precopy() into two helper functions zhanghailiang
2016-01-07 12:20 ` [Qemu-devel] [RFC 08/13] snapshot: Save VM's device state into snapshot file zhanghailiang
2016-01-07 12:20 ` [Qemu-devel] [RFC 09/13] migration/postcopy-ram: fix some helper functions to support userfaultfd write-protect zhanghailiang
2016-01-07 12:20 ` [Qemu-devel] [RFC 10/13] snapshot: Enable the write-protect notification capability for VM's RAM zhanghailiang
2016-01-07 12:20 ` [Qemu-devel] [RFC 11/13] snapshot/migration: Save VM's RAM into snapshot file zhanghailiang
2016-01-07 12:20 ` [Qemu-devel] [RFC 12/13] migration/ram: Fix some helper functions' parameter to use PageSearchStatus zhanghailiang
2016-01-11 17:55 ` Dr. David Alan Gilbert
2016-01-12 12:59 ` Hailiang Zhang
2016-01-07 12:20 ` [Qemu-devel] [RFC 13/13] snapshot: Remove page's write-protect and copy the content during setup stage zhanghailiang
2016-07-13 17:52 ` Dr. David Alan Gilbert
2016-07-14 8:02 ` Hailiang Zhang
2016-07-04 12:22 ` [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd Baptiste Reynal
2016-07-05 1:49 ` Hailiang Zhang
2016-07-05 9:57 ` Baptiste Reynal
2016-07-05 10:27 ` Hailiang Zhang
2016-08-18 15:56 ` Andrea Arcangeli
2016-08-20 6:31 ` Hailiang Zhang
2017-02-27 15:37 ` Christian Pinto
2017-02-28 1:48 ` Hailiang Zhang
2017-02-28 8:30 ` Christian Pinto
2017-02-28 16:14 ` Andrea Arcangeli
2017-03-01 1:08 ` Hailiang Zhang
2017-03-09 11:34 ` [Qemu-devel] [RFC PATCH 0/4] ARM/ARM64 fixes for live " Christian Pinto
2017-03-09 11:34 ` [Qemu-devel] [RFC PATCH 1/4] migration/postcopy-ram: check pagefault flags in userfaultfd thread Christian Pinto
2017-03-09 11:34 ` [Qemu-devel] [RFC PATCH 2/4] migration/ram: Fix for ARM/ARM64 page size Christian Pinto
2017-03-09 11:34 ` [Qemu-devel] [RFC PATCH 3/4] migration: snapshot thread Christian Pinto
2017-03-09 11:34 ` [Qemu-devel] [RFC PATCH 4/4] migration/postcopy-ram: ram_set_pages_wp fix Christian Pinto
2017-03-09 17:46 ` [Qemu-devel] [RFC PATCH 0/4] ARM/ARM64 fixes for live memory snapshot based on userfaultfd Dr. David Alan Gilbert
2017-03-10 8:15 ` Christian Pinto
2016-09-06 3:39 ` [Qemu-devel] [RFC 00/13] Live " Hailiang Zhang
2016-09-18 2:14 ` Hailiang Zhang [this message]
2016-12-08 12:45 ` Hailiang Zhang
2016-07-05 14:59 ` Andrea Arcangeli
2016-07-13 18:02 ` Dr. David Alan Gilbert
2016-07-14 10:24 ` Hailiang Zhang
2016-07-14 11:43 ` Dr. David Alan Gilbert
2016-07-19 6:53 ` Hailiang Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57DDF87C.1070506@huawei.com \
--to=zhang.zhanghailiang@huawei.com \
--cc=aarcange@redhat.com \
--cc=amit.shah@redhat.com \
--cc=b.reynal@virtualopensystems.com \
--cc=c.pinto@virtualopensystems.com \
--cc=dgilbert@redhat.com \
--cc=hanweidong@huawei.com \
--cc=peter.huangpeng@huawei.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).