From: Wen Congyang <wency@cn.fujitsu.com>
To: quintela@redhat.com, zhanghailiang <zhang.zhanghailiang@huawei.com>
Cc: hangaohuai@huawei.com, Li Zhijian <lizhijian@cn.fujitsu.com>,
qemu-devel@nongnu.org, peter.huangpeng@huawei.com,
"Gonglei (Arei)" <arei.gonglei@huawei.com>,
Amit Shah <amit.shah@redhat.com>,
"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com>,
david@gibson.dropbear.id.au
Subject: Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration
Date: Wed, 25 Mar 2015 18:21:17 +0800 [thread overview]
Message-ID: <55128C1D.5080002@cn.fujitsu.com> (raw)
In-Reply-To: <87a8z12yot.fsf@neno.neno>
On 03/25/2015 05:50 PM, Juan Quintela wrote:
> zhanghailiang <zhang.zhanghailiang@huawei.com> wrote:
>> Hi all,
>>
>> We found that, sometimes, the content of VM's memory is inconsistent between Source side and Destination side
>> when we check it just after finishing migration but before VM continue to Run.
>>
>> We use a patch like bellow to find this issue, you can find it from affix,
>> and Steps to reprduce:
>>
>> (1) Compile QEMU:
>> ./configure --target-list=x86_64-softmmu --extra-ldflags="-lssl" && make
>>
>> (2) Command and output:
>> SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -netdev tap,id=hn0-device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio
>
> Could you try to reproduce:
> - without vhost
> - without virtio-net
> - cache=unsafe is going to give you trouble, but trouble should only
> happen after migration of pages have finished.
I can use e1000 to reproduce this problem.
>
> What kind of load were you having when reproducing this issue?
> Just to confirm, you have been able to reproduce this without COLO
> patches, right?
I can reproduce it without COLO patches. The newest commit is:
commit 054903a832b865eb5432d79b5c9d1e1ff31b58d7
Author: Peter Maydell <peter.maydell@linaro.org>
Date: Tue Mar 24 16:34:16 2015 +0000
Update version for v2.3.0-rc1 release
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
>
>> (qemu) migrate tcp:192.168.3.8:3004
>> before saving ram complete
>> ff703f6889ab8701e4e040872d079a28
>> md_host : after saving ram complete
>> ff703f6889ab8701e4e040872d079a28
>>
>> DST: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -netdev tap,id=hn0,vhost=on -device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -incoming tcp:0:3004
>> (qemu) QEMU_VM_SECTION_END, after loading ram
>> 230e1e68ece9cd4e769630e1bcb5ddfb
>> md_host : after loading all vmstate
>> 230e1e68ece9cd4e769630e1bcb5ddfb
>> md_host : after cpu_synchronize_all_post_init
>> 230e1e68ece9cd4e769630e1bcb5ddfb
>>
>> This happens occasionally, and it is more easy to reproduce when issue migration command during VM's startup time.
>
> OK, a couple of things. Memory don't have to be exactly identical.
> Virtio devices in particular do funny things on "post-load". There
> aren't warantees for that as far as I know, we should end with an
> equivalent device state in memory.
>
>> We have done further test and found that some pages has been dirtied but its corresponding migration_bitmap is not set.
>> We can't figure out which modules of QEMU has missed setting bitmap when dirty page of VM,
>> it is very difficult for us to trace all the actions of dirtying VM's pages.
>
> This seems to point to a bug in one of the devices.
>
>> Actually, the first time we found this problem was in the COLO FT development, and it triggered some strange issues in
>> VM which all pointed to the issue of inconsistent of VM's memory. (We have try to save all memory of VM to slave side every time
>> when do checkpoint in COLO FT, and everything will be OK.)
>>
>> Is it OK for some pages that not transferred to destination when do migration ? Or is it a bug?
>
> Pages transferred should be the same, after device state transmission is
> when things could change.
>
>> This issue has blocked our COLO development... :(
>>
>> Any help will be greatly appreciated!
>
> Later, Juan.
>
next prev parent reply other threads:[~2015-03-25 10:18 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-25 9:31 [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration zhanghailiang
2015-03-25 9:46 ` Dr. David Alan Gilbert
2015-03-25 11:28 ` zhanghailiang
2015-03-25 11:36 ` Dr. David Alan Gilbert
2015-03-25 11:48 ` zhanghailiang
2015-03-25 9:50 ` Juan Quintela
2015-03-25 10:21 ` Wen Congyang [this message]
2015-03-25 13:12 ` Paolo Bonzini
2015-03-26 1:43 ` Wen Congyang
2015-03-25 11:32 ` zhanghailiang
2015-03-26 3:12 ` Wen Congyang
2015-03-26 3:52 ` Li Zhijian
2015-03-27 10:13 ` zhanghailiang
2015-03-27 10:18 ` Dr. David Alan Gilbert
2015-03-28 9:54 ` zhanghailiang
2015-03-30 7:59 ` Dr. David Alan Gilbert
2015-03-31 11:48 ` zhanghailiang
2015-03-31 19:06 ` Dr. David Alan Gilbert
2015-04-02 11:52 ` zhanghailiang
2015-04-02 13:00 ` Paolo Bonzini
2015-04-03 8:51 ` Jason Wang
2015-04-03 9:08 ` Wen Congyang
2015-04-03 9:20 ` zhanghailiang
2015-04-08 8:08 ` Jason Wang
2015-03-27 10:51 ` Juan Quintela
2015-03-28 1:08 ` zhanghailiang
2015-03-26 10:29 ` Juan Quintela
2015-03-26 11:57 ` Michael S. Tsirkin
2015-03-27 8:56 ` Stefan Hajnoczi
2015-03-27 9:14 ` Wen Congyang
2015-03-27 9:57 ` Stefan Hajnoczi
2015-03-27 10:05 ` Wen Congyang
2015-03-27 10:11 ` Stefan Hajnoczi
2015-03-27 10:36 ` Juan Quintela
2015-03-27 10:34 ` Juan Quintela
2015-03-31 7:54 ` Wen Congyang
2015-03-31 14:16 ` Stefan Hajnoczi
2015-04-02 9:14 ` Wen Congyang
2015-04-02 13:17 ` Paolo Bonzini
2015-04-03 1:29 ` Wen Congyang
2015-04-03 10:56 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55128C1D.5080002@cn.fujitsu.com \
--to=wency@cn.fujitsu.com \
--cc=amit.shah@redhat.com \
--cc=arei.gonglei@huawei.com \
--cc=david@gibson.dropbear.id.au \
--cc=dgilbert@redhat.com \
--cc=hangaohuai@huawei.com \
--cc=lizhijian@cn.fujitsu.com \
--cc=peter.huangpeng@huawei.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=zhang.zhanghailiang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).