From: Wen Congyang <wency@cn.fujitsu.com>
To: quintela@redhat.com, Kevin Wolf <kwolf@redhat.com>,
Stefan Hajnoczi <stefanha@redhat.com>,
Michael Tsirkin <mst@redhat.com>
Cc: hangaohuai@huawei.com,
zhanghailiang <zhang.zhanghailiang@huawei.com>,
Li Zhijian <lizhijian@cn.fujitsu.com>,
"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com>,
qemu-devel@nongnu.org, "Gonglei (Arei)" <arei.gonglei@huawei.com>,
Amit Shah <amit.shah@redhat.com>,
peter.huangpeng@huawei.com, david@gibson.dropbear.id.au
Subject: Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration
Date: Thu, 2 Apr 2015 17:14:48 +0800 [thread overview]
Message-ID: <551D0888.3000705@cn.fujitsu.com> (raw)
In-Reply-To: <874mp8xd9k.fsf@neno.neno>
On 03/26/2015 06:29 PM, Juan Quintela wrote:
> Wen Congyang <wency@cn.fujitsu.com> wrote:
>> On 03/25/2015 05:50 PM, Juan Quintela wrote:
>>> zhanghailiang <zhang.zhanghailiang@huawei.com> wrote:
>>>> Hi all,
>>>>
>>>> We found that, sometimes, the content of VM's memory is
>>>> inconsistent between Source side and Destination side
>>>> when we check it just after finishing migration but before VM continue to Run.
>>>>
>>>> We use a patch like bellow to find this issue, you can find it from affix,
>>>> and Steps to reprduce:
>>>>
>>>> (1) Compile QEMU:
>>>> ./configure --target-list=x86_64-softmmu --extra-ldflags="-lssl" && make
>>>>
>>>> (2) Command and output:
>>>> SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu
>>>> qemu64,-kvmclock -netdev tap,id=hn0-device
>>>> virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive
>>>> file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe
>>>> -device
>>>> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
>>>> -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet
>>>> -monitor stdio
>>>
>>> Could you try to reproduce:
>>> - without vhost
>>> - without virtio-net
>>> - cache=unsafe is going to give you trouble, but trouble should only
>>> happen after migration of pages have finished.
>>
>> If I use ide disk, it doesn't happen.
>> Even if I use virtio-net with vhost=on, it still doesn't happen. I guess
>> it is because I migrate the guest when it is booting. The virtio net
>> device is not used in this case.
>
> Kevin, Stefan, Michael, any great idea?
The following patch can fix this problem(vhost=off):
>From ebc024702dd3147e0cbdfd173c599103dc87796c Mon Sep 17 00:00:00 2001
From: Wen Congyang <wency@cn.fujitsu.com>
Date: Thu, 2 Apr 2015 16:28:17 +0800
Subject: [PATCH] fix qiov size
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
hw/block/virtio-blk.c | 15 +++++++++++++++
include/hw/virtio/virtio-blk.h | 1 +
2 files changed, 16 insertions(+)
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 000c38d..13967bc 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -33,6 +33,7 @@ VirtIOBlockReq *virtio_blk_alloc_request(VirtIOBlock *s)
VirtIOBlockReq *req = g_slice_new(VirtIOBlockReq);
req->dev = s;
req->qiov.size = 0;
+ req->size = 0;
req->next = NULL;
req->mr_next = NULL;
return req;
@@ -97,12 +98,20 @@ static void virtio_blk_rw_complete(void *opaque, int ret)
* external iovec. It was allocated in submit_merged_requests
* to be able to merge requests. */
qemu_iovec_destroy(&req->qiov);
+
+ /* Restore qiov->size here */
+ req->qiov.size = req->size;
}
if (ret) {
int p = virtio_ldl_p(VIRTIO_DEVICE(req->dev), &req->out.type);
bool is_read = !(p & VIRTIO_BLK_T_OUT);
if (virtio_blk_handle_rw_error(req, -ret, is_read)) {
+ /*
+ * FIXME:
+ * The memory may be dirtied on read failure, it will
+ * break live migration.
+ */
continue;
}
}
@@ -323,6 +332,12 @@ static inline void submit_requests(BlockBackend *blk, MultiReqBuffer *mrb,
struct iovec *tmp_iov = qiov->iov;
int tmp_niov = qiov->niov;
+ /*
+ * Save old qiov->size, which will used in
+ * virtio_blk_complete_request()
+ */
+ mrb->reqs[start]->size = qiov->size;
+
/* mrb->reqs[start]->qiov was initialized from external so we can't
* modifiy it here. We need to initialize it locally and then add the
* external iovecs. */
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index b3ffcd9..7d47310 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -67,6 +67,7 @@ typedef struct VirtIOBlockReq {
struct virtio_blk_inhdr *in;
struct virtio_blk_outhdr out;
QEMUIOVector qiov;
+ size_t size;
struct VirtIOBlockReq *next;
struct VirtIOBlockReq *mr_next;
BlockAcctCookie acct;
--
2.1.0
PS: I don't check if virtio-scsi, virtio-net... has the similar problem.
If vhost=on, we can also reproduce this problem.
>
> Thanks, Juan.
>
>>
>> Thanks
>> Wen Congyang
>>
>>>
>>> What kind of load were you having when reproducing this issue?
>>> Just to confirm, you have been able to reproduce this without COLO
>>> patches, right?
>>>
>>>> (qemu) migrate tcp:192.168.3.8:3004
>>>> before saving ram complete
>>>> ff703f6889ab8701e4e040872d079a28
>>>> md_host : after saving ram complete
>>>> ff703f6889ab8701e4e040872d079a28
>>>>
>>>> DST: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu
>>>> qemu64,-kvmclock -netdev tap,id=hn0,vhost=on -device
>>>> virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive
>>>> file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe
>>>> -device
>>>> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
>>>> -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet
>>>> -monitor stdio -incoming tcp:0:3004
>>>> (qemu) QEMU_VM_SECTION_END, after loading ram
>>>> 230e1e68ece9cd4e769630e1bcb5ddfb
>>>> md_host : after loading all vmstate
>>>> 230e1e68ece9cd4e769630e1bcb5ddfb
>>>> md_host : after cpu_synchronize_all_post_init
>>>> 230e1e68ece9cd4e769630e1bcb5ddfb
>>>>
>>>> This happens occasionally, and it is more easy to reproduce when
>>>> issue migration command during VM's startup time.
>>>
>>> OK, a couple of things. Memory don't have to be exactly identical.
>>> Virtio devices in particular do funny things on "post-load". There
>>> aren't warantees for that as far as I know, we should end with an
>>> equivalent device state in memory.
>>>
>>>> We have done further test and found that some pages has been
>>>> dirtied but its corresponding migration_bitmap is not set.
>>>> We can't figure out which modules of QEMU has missed setting bitmap
>>>> when dirty page of VM,
>>>> it is very difficult for us to trace all the actions of dirtying VM's pages.
>>>
>>> This seems to point to a bug in one of the devices.
>>>
>>>> Actually, the first time we found this problem was in the COLO FT
>>>> development, and it triggered some strange issues in
>>>> VM which all pointed to the issue of inconsistent of VM's
>>>> memory. (We have try to save all memory of VM to slave side every
>>>> time
>>>> when do checkpoint in COLO FT, and everything will be OK.)
>>>>
>>>> Is it OK for some pages that not transferred to destination when do
>>>> migration ? Or is it a bug?
>>>
>>> Pages transferred should be the same, after device state transmission is
>>> when things could change.
>>>
>>>> This issue has blocked our COLO development... :(
>>>>
>>>> Any help will be greatly appreciated!
>>>
>>> Later, Juan.
>>>
> .
>
next prev parent reply other threads:[~2015-04-02 9:11 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-25 9:31 [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration zhanghailiang
2015-03-25 9:46 ` Dr. David Alan Gilbert
2015-03-25 11:28 ` zhanghailiang
2015-03-25 11:36 ` Dr. David Alan Gilbert
2015-03-25 11:48 ` zhanghailiang
2015-03-25 9:50 ` Juan Quintela
2015-03-25 10:21 ` Wen Congyang
2015-03-25 13:12 ` Paolo Bonzini
2015-03-26 1:43 ` Wen Congyang
2015-03-25 11:32 ` zhanghailiang
2015-03-26 3:12 ` Wen Congyang
2015-03-26 3:52 ` Li Zhijian
2015-03-27 10:13 ` zhanghailiang
2015-03-27 10:18 ` Dr. David Alan Gilbert
2015-03-28 9:54 ` zhanghailiang
2015-03-30 7:59 ` Dr. David Alan Gilbert
2015-03-31 11:48 ` zhanghailiang
2015-03-31 19:06 ` Dr. David Alan Gilbert
2015-04-02 11:52 ` zhanghailiang
2015-04-02 13:00 ` Paolo Bonzini
2015-04-03 8:51 ` Jason Wang
2015-04-03 9:08 ` Wen Congyang
2015-04-03 9:20 ` zhanghailiang
2015-04-08 8:08 ` Jason Wang
2015-03-27 10:51 ` Juan Quintela
2015-03-28 1:08 ` zhanghailiang
2015-03-26 10:29 ` Juan Quintela
2015-03-26 11:57 ` Michael S. Tsirkin
2015-03-27 8:56 ` Stefan Hajnoczi
2015-03-27 9:14 ` Wen Congyang
2015-03-27 9:57 ` Stefan Hajnoczi
2015-03-27 10:05 ` Wen Congyang
2015-03-27 10:11 ` Stefan Hajnoczi
2015-03-27 10:36 ` Juan Quintela
2015-03-27 10:34 ` Juan Quintela
2015-03-31 7:54 ` Wen Congyang
2015-03-31 14:16 ` Stefan Hajnoczi
2015-04-02 9:14 ` Wen Congyang [this message]
2015-04-02 13:17 ` Paolo Bonzini
2015-04-03 1:29 ` Wen Congyang
2015-04-03 10:56 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=551D0888.3000705@cn.fujitsu.com \
--to=wency@cn.fujitsu.com \
--cc=amit.shah@redhat.com \
--cc=arei.gonglei@huawei.com \
--cc=david@gibson.dropbear.id.au \
--cc=dgilbert@redhat.com \
--cc=hangaohuai@huawei.com \
--cc=kwolf@redhat.com \
--cc=lizhijian@cn.fujitsu.com \
--cc=mst@redhat.com \
--cc=peter.huangpeng@huawei.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=stefanha@redhat.com \
--cc=zhang.zhanghailiang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).