From: Hailiang Zhang <zhang.zhanghailiang@huawei.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>,
Li Zhijian <lizhijian@cn.fujitsu.com>
Cc: Juan Quintela <quintela@redhat.com>,
peter.huangpeng@huawei.com, qemu-devel@nongnu.org,
Amit Shah <amit.shah@redhat.com>,
david@gibson.dropbear.id.au
Subject: Re: [Qemu-devel] [TCG only][Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration
Date: Thu, 3 Dec 2015 17:37:11 +0800 [thread overview]
Message-ID: <56600D47.7050004@huawei.com> (raw)
In-Reply-To: <20151203092404.GC2591@work-vm>
On 2015/12/3 17:24, Dr. David Alan Gilbert wrote:
> * Li Zhijian (lizhijian@cn.fujitsu.com) wrote:
>> Hi all,
>>
>> Does anyboday remember the similar issue post by hailiang months ago
>> http://patchwork.ozlabs.org/patch/454322/
>> At least tow bugs about migration had been fixed since that.
>
> Yes, I wondered what happened to that.
>
>> And now we found the same issue at the tcg vm(kvm is fine), after migration,
>> the content VM's memory is inconsistent.
>
> Hmm, TCG only - I don't know much about that; but I guess something must
> be accessing memory without using the proper macros/functions so
> it doesn't mark it as dirty.
>
>> we add a patch to check memory content, you can find it from affix
>>
>> steps to reporduce:
>> 1) apply the patch and re-build qemu
>> 2) prepare the ubuntu guest and run memtest in grub.
>> soruce side:
>> x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
>> e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
>> if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
>> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
>> -vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
>> tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
>> pc-i440fx-2.3,accel=tcg,usb=off
>>
>> destination side:
>> x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
>> e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
>> if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
>> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
>> -vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
>> tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
>> pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881
>>
>> 3) start migration
>> with 1000M NIC, migration will finish within 3 min.
>>
>> at source:
>> (qemu) migrate tcp:192.168.2.66:8881
>> after saving ram complete
>> e9e725df678d392b1a83b3a917f332bb
>> qemu-system-x86_64: end ram md5
>> (qemu)
>>
>> at destination:
>> ...skip...
>> Completed load of VM with exit code 0 seq iteration 1264
>> Completed load of VM with exit code 0 seq iteration 1265
>> Completed load of VM with exit code 0 seq iteration 1266
>> qemu-system-x86_64: after loading state section id 2(ram)
>> 49c2dac7bde0e5e22db7280dcb3824f9
>> qemu-system-x86_64: end ram md5
>> qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init
>>
>> 49c2dac7bde0e5e22db7280dcb3824f9
>> qemu-system-x86_64: end ram md5
>>
>> This occurs occasionally and only at tcg machine. It seems that
>> some pages dirtied in source side don't transferred to destination.
>> This problem can be reproduced even if we disable virtio.
>>
>> Is it OK for some pages that not transferred to destination when do
>> migration ? Or is it a bug?
>
> I'm pretty sure that means it's a bug. Hard to find though, I guess
> at least memtest is smaller than a big OS. I think I'd dump the whole
> of memory on both sides, hexdump and diff them - I'd guess it would
> just be one byte/word different, maybe that would offer some idea what
> wrote it.
>
Maybe one better way to do that is with the help of userfaultfd's write-protect
capability. It is still in the development by Andrea Arcangeli, but there
is a RFC version available, please refer to http://www.spinics.net/lists/linux-mm/msg97422.html
(I'm developing live memory snapshot which based on it, maybe this is another scene where we
can use userfaultfd's WP ;) ).
> Dave
>
>> Any idea...
>>
>> =================md5 check patch=============================
>>
>> diff --git a/Makefile.target b/Makefile.target
>> index 962d004..e2cb8e9 100644
>> --- a/Makefile.target
>> +++ b/Makefile.target
>> @@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o
>> obj-y += memory_mapping.o
>> obj-y += dump.o
>> obj-y += migration/ram.o migration/savevm.o
>> -LIBS := $(libs_softmmu) $(LIBS)
>> +LIBS := $(libs_softmmu) $(LIBS) -lplumb
>>
>> # xen support
>> obj-$(CONFIG_XEN) += xen-common.o
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 1eb155a..3b7a09d 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int
>> version_id)
>> }
>>
>> rcu_read_unlock();
>> - DPRINTF("Completed load of VM with exit code %d seq iteration "
>> + fprintf(stderr, "Completed load of VM with exit code %d seq iteration "
>> "%" PRIu64 "\n", ret, seq_iter);
>> return ret;
>> }
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index 0ad1b93..3feaa61 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f)
>>
>> }
>>
>> +#include "exec/ram_addr.h"
>> +#include "qemu/rcu_queue.h"
>> +#include <clplumbing/md5.h>
>> +#ifndef MD5_DIGEST_LENGTH
>> +#define MD5_DIGEST_LENGTH 16
>> +#endif
>> +
>> +static void check_host_md5(void)
>> +{
>> + int i;
>> + unsigned char md[MD5_DIGEST_LENGTH];
>> + rcu_read_lock();
>> + RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check
>> 'pc.ram' block */
>> + rcu_read_unlock();
>> +
>> + MD5(block->host, block->used_length, md);
>> + for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
>> + fprintf(stderr, "%02x", md[i]);
>> + }
>> + fprintf(stderr, "\n");
>> + error_report("end ram md5");
>> +}
>> +
>> void qemu_savevm_state_begin(QEMUFile *f,
>> const MigrationParams *params)
>> {
>> @@ -1056,6 +1079,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f,
>> bool iterable_only)
>> save_section_header(f, se, QEMU_VM_SECTION_END);
>>
>> ret = se->ops->save_live_complete_precopy(f, se->opaque);
>> +
>> + fprintf(stderr, "after saving %s complete\n", se->idstr);
>> + check_host_md5();
>> +
>> trace_savevm_section_end(se->idstr, se->section_id, ret);
>> save_section_footer(f, se);
>> if (ret < 0) {
>> @@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f,
>> MigrationIncomingState *mis)
>> section_id, le->se->idstr);
>> return ret;
>> }
>> + if (section_type == QEMU_VM_SECTION_END) {
>> + error_report("after loading state section id %d(%s)",
>> + section_id, le->se->idstr);
>> + check_host_md5();
>> + }
>> if (!check_section_footer(f, le)) {
>> return -EINVAL;
>> }
>> @@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f)
>> }
>>
>> cpu_synchronize_all_post_init();
>> + error_report("%s: after cpu_synchronize_all_post_init\n", __func__);
>> + check_host_md5();
>>
>> return ret;
>> }
>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>
next prev parent reply other threads:[~2015-12-03 9:37 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-03 7:32 [Qemu-devel] [TCG only][Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration Li Zhijian
2015-12-03 9:24 ` Dr. David Alan Gilbert
2015-12-03 9:37 ` Hailiang Zhang [this message]
2015-12-03 10:23 ` Li Zhijian
2015-12-03 10:23 ` Li Zhijian
2015-12-03 11:22 ` Dr. David Alan Gilbert
2015-12-03 11:20 ` Juan Quintela
2015-12-04 1:43 ` Li, Liang Z
2015-12-17 6:07 ` Amit Shah
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56600D47.7050004@huawei.com \
--to=zhang.zhanghailiang@huawei.com \
--cc=amit.shah@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=dgilbert@redhat.com \
--cc=lizhijian@cn.fujitsu.com \
--cc=peter.huangpeng@huawei.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).