qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Hailiang Zhang <zhang.zhanghailiang@huawei.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: peter.huangpeng@huawei.com,
	Baptiste Reynal <b.reynal@virtualopensystems.com>,
	qemu list <qemu-devel@nongnu.org>,
	hanweidong@huawei.com, Juan Quintela <quintela@redhat.com>,
	dgilbert@redhat.com, Amit Shah <amit.shah@redhat.com>,
	Christian Pinto <c.pinto@virtualopensystems.com>
Subject: Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd
Date: Tue, 6 Sep 2016 11:39:41 +0800	[thread overview]
Message-ID: <57CE3A7D.3030404@huawei.com> (raw)
In-Reply-To: <20160818155636.l46t4ha65eybnnhe@redhat.com>

Hi Andrea,

I tested it with the new live memory snapshot with --enable-kvm, it doesn't work.

To make things simple, I simplified the codes, only left the codes that can tested
the write-protect capability. You can find the codes from
https://github.com/coloft/qemu/tree/test-userfault-write-protect.
You can reproduce the problem easily with it.

Tested result as follow,
[root@localhost qemu]# x86_64-softmmu/qemu-system-x86_64 --enable-kvm -drive file=/mnt/sdb/win7/win7.qcow2,if=none,id=drive-ide0-0-1,format=qcow2,cache=none  -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1  -vnc :7 -m 8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0  --monitor stdio
QEMU 2.6.95 monitor - type 'help' for more information
(qemu) migrate file:/home/xxx
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove write protect!
error: kvm run failed Bad address
EAX=00000004 EBX=00000000 ECX=83b2ac20 EDX=0000c022
ESI=85fe33f4 EDI=0000c020 EBP=83b2abcc ESP=83b2abc0
EIP=8bd2ff0c EFL=00010293 [--S-A-C] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0023 00000000 ffffffff 00c0f300 DPL=3 DS   [-WA]
CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
SS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0023 00000000 ffffffff 00c0f300 DPL=3 DS   [-WA]
FS =0030 83b2dc00 00003748 00409300 DPL=0 DS   [-WA]
GS =0000 00000000 ffffffff 00000000
LDT=0000 00000000 ffffffff 00000000
TR =0028 801e2000 000020ab 00008b00 DPL=0 TSS32-busy
GDT=     80b95000 000003ff
IDT=     80b95400 000007ff
CR0=8001003b CR2=030b5000 CR3=00185000 CR4=000006f8
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000800
Code=8b ff 55 8b ec 53 56 8b 75 08 57 8b 7e 34 56 e8 30 f7 ff ff <6a> 00 57 8a d8 e8 96 14 00 00 6a 04 83 c7 02 57 e8 8b 14 00 00 5f c6 46 5b 00 5e 8a c3 5b

I investigated kvm and userfault codes. we use MMU Notifier to integrating KVM with the Linux
Memory Management.

Here for userfault write-protect, the function calling paths are:
userfaultfd_ioctl
   -> userfaultfd_writeprotect
     -> mwriteprotect_range
       -> change_protection (Directly call mprotect helper here)
         -> change_protection_range
           -> change_pud_range
             -> change_pmd_range
                -> mmu_notifier_invalidate_range_start(mm, mni_start, end);
                   -> kvm_mmu_notifier_invalidate_range_start (KVM module)
OK, here, we remove the item from spte. (If we use EPT hardware, we remove
the page table entry for it).
That's why we can get fault notifying for VM.
And It seems that we can't fix the userfault (remove the page's write-protect authority)
by this function calling paths.

Here my question is, for userfault write-protect capability, why we remove the page table
entry instead of marking it as read-only.
Actually, for KVM, we have a mmu notifier (kvm_mmu_notifier_change_pte) to do this,
We can use it to remove the writable authority for KVM page table, just like KVM dirty log tracking
does. Please see function __rmap_write_protect() in KVM.

Another question, is mprotect() works normally with KVM ? (I didn't test it.), I think
KSM and swap can work with KVM properly.

Besides, there seems to be a bug for userfault write-protect.
We use UFFDIO_COPY_MODE_DONTWAKE in userfaultfd_writeprotect, should it be
UFFDIO_WRITEPROTECT_MODE_DONTWAKE there ?

static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx,
				    unsigned long arg)
{
        ... ...

	if (!(uffdio_wp.mode & UFFDIO_COPY_MODE_DONTWAKE)) {
		range.start = uffdio_wp.range.start;
		range.len = uffdio_wp.range.len;
		wake_userfault(ctx, &range);
	}
	return ret;
}

Thanks.
Hailiang

On 2016/8/18 23:56, Andrea Arcangeli wrote:
> Hello everyone,
>
> I've an aa.git tree uptodate on the master & userfault branch (master
> includes other pending VM stuff, userfault branch only contains
> userfault enhancements):
>
> https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/log/?h=userfault
>
> I didn't have time to test KVM live memory snapshot on it yet as I'm
> still working to improve it. Did anybody test it? However I'd be happy
> to take any bugreports and quickly solve anything that isn't working
> right with the shadow MMU.
>
> I got positive report already for another usage of the uffd WP support:
>
> https://medium.com/@MartinCracauer/generational-garbage-collection-write-barriers-write-protection-and-userfaultfd-2-8b0e796b8f7f
>
> The last few things I'm working on to finish the WP support are:
>
> 1) pte_swp_mkuffd_wp equivalent of pte_swp_mksoft_dirty to mark in a
>     vma->vm_flags with VM_UFFD_WP set, which swap entries were
>     generated while the pte was wrprotected.
>
> 2) to avoid all false positives the equivalent of pte_mksoft_dirty is
>     needed too... and that requires spare software bits on the pte
>     which are available on x86. I considered also taking over the
>     soft_dirty bit but then you couldn't do checkpoint restore of a
>     JIT/to-native compiler that uses uffd WP support so it wasn't
>     ideal. Perhaps it would be ok as an incremental patch to make the
>     two options mutually exclusive to defer the arch changes that
>     pte_mkuffd_wp would require for later.
>
> 3) prevent UFFDIO_ZEROPAGE if registering WP|MISSING or trigger a
>     cow in userfaultfd_writeprotect.
>
> 4) WP selftest
>
> In theory things should work ok already if the userland code is
> tolerant against false positives through swap and after fork() and
> KSM. For an usage like snapshotting false positives shouldn't be an
> issue (it'll just run slower if you swap in the worst case), and point
> 3) above also isn't an issue because it's going to register into uffd
> with WP only.
>
> The current status includes:
>
> 1) WP support for anon (with false positives.. work in progress)
>
> 2) MISSING support for tmpfs and hugetlbfs
>
> 3) non cooperative support
>
> Thanks,
> Andrea
>
> .
>

  parent reply	other threads:[~2016-09-06  3:40 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-07 12:19 [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd zhanghailiang
2016-01-07 12:19 ` [Qemu-devel] [RFC 01/13] postcopy/migration: Split fault related state into struct UserfaultState zhanghailiang
2016-01-07 12:19 ` [Qemu-devel] [RFC 02/13] migration: Allow the migrate command to work on file: urls zhanghailiang
2016-07-13 16:12   ` Dr. David Alan Gilbert
2016-07-14  5:27     ` Hailiang Zhang
2016-01-07 12:19 ` [Qemu-devel] [RFC 03/13] migration: Allow -incoming " zhanghailiang
2016-01-11 20:02   ` Dr. David Alan Gilbert
2016-01-12 13:04     ` Hailiang Zhang
2016-01-07 12:19 ` [Qemu-devel] [RFC 04/13] migration: Create a snapshot thread to realize saving memory snapshot zhanghailiang
2016-01-07 12:20 ` [Qemu-devel] [RFC 05/13] migration: implement initialization work for snapshot zhanghailiang
2016-01-07 12:20 ` [Qemu-devel] [RFC 06/13] QEMUSizedBuffer: Introduce two help functions for qsb zhanghailiang
2016-01-07 12:20 ` [Qemu-devel] [RFC 07/13] savevm: Split qemu_savevm_state_complete_precopy() into two helper functions zhanghailiang
2016-01-07 12:20 ` [Qemu-devel] [RFC 08/13] snapshot: Save VM's device state into snapshot file zhanghailiang
2016-01-07 12:20 ` [Qemu-devel] [RFC 09/13] migration/postcopy-ram: fix some helper functions to support userfaultfd write-protect zhanghailiang
2016-01-07 12:20 ` [Qemu-devel] [RFC 10/13] snapshot: Enable the write-protect notification capability for VM's RAM zhanghailiang
2016-01-07 12:20 ` [Qemu-devel] [RFC 11/13] snapshot/migration: Save VM's RAM into snapshot file zhanghailiang
2016-01-07 12:20 ` [Qemu-devel] [RFC 12/13] migration/ram: Fix some helper functions' parameter to use PageSearchStatus zhanghailiang
2016-01-11 17:55   ` Dr. David Alan Gilbert
2016-01-12 12:59     ` Hailiang Zhang
2016-01-07 12:20 ` [Qemu-devel] [RFC 13/13] snapshot: Remove page's write-protect and copy the content during setup stage zhanghailiang
2016-07-13 17:52   ` Dr. David Alan Gilbert
2016-07-14  8:02     ` Hailiang Zhang
2016-07-04 12:22 ` [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd Baptiste Reynal
2016-07-05  1:49   ` Hailiang Zhang
2016-07-05  9:57     ` Baptiste Reynal
2016-07-05 10:27       ` Hailiang Zhang
2016-08-18 15:56         ` Andrea Arcangeli
2016-08-20  6:31           ` Hailiang Zhang
2017-02-27 15:37             ` Christian Pinto
2017-02-28  1:48               ` Hailiang Zhang
2017-02-28  8:30                 ` Christian Pinto
2017-02-28 16:14                 ` Andrea Arcangeli
2017-03-01  1:08                   ` Hailiang Zhang
2017-03-09 11:34             ` [Qemu-devel] [RFC PATCH 0/4] ARM/ARM64 fixes for live " Christian Pinto
2017-03-09 11:34               ` [Qemu-devel] [RFC PATCH 1/4] migration/postcopy-ram: check pagefault flags in userfaultfd thread Christian Pinto
2017-03-09 11:34               ` [Qemu-devel] [RFC PATCH 2/4] migration/ram: Fix for ARM/ARM64 page size Christian Pinto
2017-03-09 11:34               ` [Qemu-devel] [RFC PATCH 3/4] migration: snapshot thread Christian Pinto
2017-03-09 11:34               ` [Qemu-devel] [RFC PATCH 4/4] migration/postcopy-ram: ram_set_pages_wp fix Christian Pinto
2017-03-09 17:46               ` [Qemu-devel] [RFC PATCH 0/4] ARM/ARM64 fixes for live memory snapshot based on userfaultfd Dr. David Alan Gilbert
2017-03-10  8:15                 ` Christian Pinto
2016-09-06  3:39           ` Hailiang Zhang [this message]
2016-09-18  2:14             ` [Qemu-devel] [RFC 00/13] Live " Hailiang Zhang
2016-12-08 12:45               ` Hailiang Zhang
2016-07-05 14:59       ` Andrea Arcangeli
2016-07-13 18:02 ` Dr. David Alan Gilbert
2016-07-14 10:24   ` Hailiang Zhang
2016-07-14 11:43     ` Dr. David Alan Gilbert
2016-07-19  6:53       ` Hailiang Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57CE3A7D.3030404@huawei.com \
    --to=zhang.zhanghailiang@huawei.com \
    --cc=aarcange@redhat.com \
    --cc=amit.shah@redhat.com \
    --cc=b.reynal@virtualopensystems.com \
    --cc=c.pinto@virtualopensystems.com \
    --cc=dgilbert@redhat.com \
    --cc=hanweidong@huawei.com \
    --cc=peter.huangpeng@huawei.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).