From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57979) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a4R3u-0005yl-Ix for qemu-devel@nongnu.org; Thu, 03 Dec 2015 05:24:39 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a4R3r-0006Jc-DL for qemu-devel@nongnu.org; Thu, 03 Dec 2015 05:24:30 -0500 Received: from [59.151.112.132] (port=45731 helo=heian.cn.fujitsu.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a4R3q-0006Ho-Em for qemu-devel@nongnu.org; Thu, 03 Dec 2015 05:24:27 -0500 References: <565FF018.9040206@cn.fujitsu.com> <20151203092404.GC2591@work-vm> <56600D47.7050004@huawei.com> From: Li Zhijian Message-ID: <5660183E.6050907@cn.fujitsu.com> Date: Thu, 3 Dec 2015 18:23:58 +0800 MIME-Version: 1.0 In-Reply-To: <56600D47.7050004@huawei.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [TCG only][Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Hailiang Zhang , "Dr. David Alan Gilbert" Cc: Juan Quintela , peter.huangpeng@huawei.com, qemu-devel@nongnu.org, Amit Shah , david@gibson.dropbear.id.au On 12/03/2015 05:37 PM, Hailiang Zhang wrote: > On 2015/12/3 17:24, Dr. David Alan Gilbert wrote: >> * Li Zhijian (lizhijian@cn.fujitsu.com) wrote: >>> Hi all, >>> >>> Does anyboday remember the similar issue post by hailiang months ago >>> http://patchwork.ozlabs.org/patch/454322/ >>> At least tow bugs about migration had been fixed since that. >> >> Yes, I wondered what happened to that. >> >>> And now we found the same issue at the tcg vm(kvm is fine), after >>> migration, >>> the content VM's memory is inconsistent. >> >> Hmm, TCG only - I don't know much about that; but I guess something must >> be accessing memory without using the proper macros/functions so >> it doesn't mark it as dirty. >> >>> we add a patch to check memory content, you can find it from affix >>> >>> steps to reporduce: >>> 1) apply the patch and re-build qemu >>> 2) prepare the ubuntu guest and run memtest in grub. >>> soruce side: >>> x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device >>> e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive >>> if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device >>> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 >>> >>> -vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp >>> tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine >>> pc-i440fx-2.3,accel=tcg,usb=off >>> >>> destination side: >>> x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device >>> e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive >>> if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device >>> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 >>> >>> -vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp >>> tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine >>> pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881 >>> >>> 3) start migration >>> with 1000M NIC, migration will finish within 3 min. >>> >>> at source: >>> (qemu) migrate tcp:192.168.2.66:8881 >>> after saving ram complete >>> e9e725df678d392b1a83b3a917f332bb >>> qemu-system-x86_64: end ram md5 >>> (qemu) >>> >>> at destination: >>> ...skip... >>> Completed load of VM with exit code 0 seq iteration 1264 >>> Completed load of VM with exit code 0 seq iteration 1265 >>> Completed load of VM with exit code 0 seq iteration 1266 >>> qemu-system-x86_64: after loading state section id 2(ram) >>> 49c2dac7bde0e5e22db7280dcb3824f9 >>> qemu-system-x86_64: end ram md5 >>> qemu-system-x86_64: qemu_loadvm_state: after >>> cpu_synchronize_all_post_init >>> >>> 49c2dac7bde0e5e22db7280dcb3824f9 >>> qemu-system-x86_64: end ram md5 >>> >>> This occurs occasionally and only at tcg machine. It seems that >>> some pages dirtied in source side don't transferred to destination. >>> This problem can be reproduced even if we disable virtio. >>> >>> Is it OK for some pages that not transferred to destination when do >>> migration ? Or is it a bug? >> >> I'm pretty sure that means it's a bug. Hard to find though, I guess >> at least memtest is smaller than a big OS. I think I'd dump the whole >> of memory on both sides, hexdump and diff them - I'd guess it would >> just be one byte/word different, maybe that would offer some idea what >> wrote it. >> > > Maybe one better way to do that is with the help of userfaultfd's > write-protect > capability. It is still in the development by Andrea Arcangeli, but there > is a RFC version available, please refer to > http://www.spinics.net/lists/linux-mm/msg97422.html > (I'm developing live memory snapshot which based on it, maybe this is > another scene where we > can use userfaultfd's WP ;) ). sounds good. thanks Li > > >> Dave >> >>> Any idea... >>> >>> =================md5 check patch============================= >>> >>> diff --git a/Makefile.target b/Makefile.target >>> index 962d004..e2cb8e9 100644 >>> --- a/Makefile.target >>> +++ b/Makefile.target >>> @@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o >>> obj-y += memory_mapping.o >>> obj-y += dump.o >>> obj-y += migration/ram.o migration/savevm.o >>> -LIBS := $(libs_softmmu) $(LIBS) >>> +LIBS := $(libs_softmmu) $(LIBS) -lplumb >>> >>> # xen support >>> obj-$(CONFIG_XEN) += xen-common.o >>> diff --git a/migration/ram.c b/migration/ram.c >>> index 1eb155a..3b7a09d 100644 >>> --- a/migration/ram.c >>> +++ b/migration/ram.c >>> @@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int >>> version_id) >>> } >>> >>> rcu_read_unlock(); >>> - DPRINTF("Completed load of VM with exit code %d seq iteration " >>> + fprintf(stderr, "Completed load of VM with exit code %d seq >>> iteration " >>> "%" PRIu64 "\n", ret, seq_iter); >>> return ret; >>> } >>> diff --git a/migration/savevm.c b/migration/savevm.c >>> index 0ad1b93..3feaa61 100644 >>> --- a/migration/savevm.c >>> +++ b/migration/savevm.c >>> @@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f) >>> >>> } >>> >>> +#include "exec/ram_addr.h" >>> +#include "qemu/rcu_queue.h" >>> +#include >>> +#ifndef MD5_DIGEST_LENGTH >>> +#define MD5_DIGEST_LENGTH 16 >>> +#endif >>> + >>> +static void check_host_md5(void) >>> +{ >>> + int i; >>> + unsigned char md[MD5_DIGEST_LENGTH]; >>> + rcu_read_lock(); >>> + RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check >>> 'pc.ram' block */ >>> + rcu_read_unlock(); >>> + >>> + MD5(block->host, block->used_length, md); >>> + for(i = 0; i < MD5_DIGEST_LENGTH; i++) { >>> + fprintf(stderr, "%02x", md[i]); >>> + } >>> + fprintf(stderr, "\n"); >>> + error_report("end ram md5"); >>> +} >>> + >>> void qemu_savevm_state_begin(QEMUFile *f, >>> const MigrationParams *params) >>> { >>> @@ -1056,6 +1079,10 @@ void >>> qemu_savevm_state_complete_precopy(QEMUFile *f, >>> bool iterable_only) >>> save_section_header(f, se, QEMU_VM_SECTION_END); >>> >>> ret = se->ops->save_live_complete_precopy(f, se->opaque); >>> + >>> + fprintf(stderr, "after saving %s complete\n", se->idstr); >>> + check_host_md5(); >>> + >>> trace_savevm_section_end(se->idstr, se->section_id, ret); >>> save_section_footer(f, se); >>> if (ret < 0) { >>> @@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f, >>> MigrationIncomingState *mis) >>> section_id, le->se->idstr); >>> return ret; >>> } >>> + if (section_type == QEMU_VM_SECTION_END) { >>> + error_report("after loading state section id %d(%s)", >>> + section_id, le->se->idstr); >>> + check_host_md5(); >>> + } >>> if (!check_section_footer(f, le)) { >>> return -EINVAL; >>> } >>> @@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f) >>> } >>> >>> cpu_synchronize_all_post_init(); >>> + error_report("%s: after cpu_synchronize_all_post_init\n", >>> __func__); >>> + check_host_md5(); >>> >>> return ret; >>> } >>> >>> >>> >> -- >> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK >> >> . >> > > > > > . > -- Best regards. Li Zhijian (8555)