From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:42223)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <zhang.zhanghailiang@huawei.com>) id 1Ycuf1-0004e4-Gu
	for qemu-devel@nongnu.org; Tue, 31 Mar 2015 07:48:50 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <zhang.zhanghailiang@huawei.com>) id 1Ycuey-00046g-9p
	for qemu-devel@nongnu.org; Tue, 31 Mar 2015 07:48:47 -0400
Received: from szxga03-in.huawei.com ([119.145.14.66]:1726)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <zhang.zhanghailiang@huawei.com>) id 1Ycuex-00045O-7S
	for qemu-devel@nongnu.org; Tue, 31 Mar 2015 07:48:44 -0400
Message-ID: <551A897E.4090909@huawei.com>
Date: Tue, 31 Mar 2015 19:48:14 +0800
From: zhanghailiang <zhang.zhanghailiang@huawei.com>
MIME-Version: 1.0
References: <55128084.2040304@huawei.com> <87a8z12yot.fsf@neno.neno>
	<5513793B.6020909@cn.fujitsu.com>
	<5513826D.2010505@cn.fujitsu.com> <55152D5B.1090906@huawei.com>
	<20150327101831.GB2685@work-vm> <55167A60.9030602@huawei.com>
	<20150330075934.GA2474@work-vm>
In-Reply-To: <20150330075934.GA2474@work-vm>
Content-Type: text/plain; charset="UTF-8"; format=flowed
Content-Transfer-Encoding: 8bit
Subject: Re: [Qemu-devel] [Migration Bug? ] Occasionally,
 the content of VM's memory is inconsistent between Source and
 Destination of migration
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>, quintela@redhat.com
Cc: hangaohuai@huawei.com, Li Zhijian <lizhijian@cn.fujitsu.com>, peter.huangpeng@huawei.com, qemu-devel@nongnu.org, "Gonglei (Arei)" <arei.gonglei@huawei.com>, Amit Shah <amit.shah@redhat.com>, david@gibson.dropbear.id.au

On 2015/3/30 15:59, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> On 2015/3/27 18:18, Dr. David Alan Gilbert wrote:
>>> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>>>> On 2015/3/26 11:52, Li Zhijian wrote:
>>>>> On 03/26/2015 11:12 AM, Wen Congyang wrote:
>>>>>> On 03/25/2015 05:50 PM, Juan Quintela wrote:
>>>>>>> zhanghailiang<zhang.zhanghailiang@huawei.com>  wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> We found that, sometimes, the content of VM's memory is inconsistent between Source side and Destination side
>>>>>>>> when we check it just after finishing migration but before VM continue to Run.
>>>>>>>>
>>>>>>>> We use a patch like bellow to find this issue, you can find it from affix,
>>>>>>>> and Steps to reprduce:
>>>>>>>>
>>>>>>>> (1) Compile QEMU:
>>>>>>>>   ./configure --target-list=x86_64-softmmu  --extra-ldflags="-lssl" && make
>>>>>>>>
>>>>>>>> (2) Command and output:
>>>>>>>> SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -netdev tap,id=hn0-device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio
>>>>>>> Could you try to reproduce:
>>>>>>> - without vhost
>>>>>>> - without virtio-net
>>>>>>> - cache=unsafe is going to give you trouble, but trouble should only
>>>>>>>    happen after migration of pages have finished.
>>>>>> If I use ide disk, it doesn't happen.
>>>>>> Even if I use virtio-net with vhost=on, it still doesn't happen. I guess
>>>>>> it is because I migrate the guest when it is booting. The virtio net
>>>>>> device is not used in this case.
>>>>> Er??????
>>>>> it reproduces in my ide disk
>>>>> there is no any virtio device, my command line like below
>>>>>
>>>>> x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -net none
>>>>> -boot c -drive file=/home/lizj/ubuntu.raw -vnc :7 -m 2048 -smp 2 -machine
>>>>> usb=off -no-user-config -nodefaults -monitor stdio -vga std
>>>>>
>>>>> it seems easily to reproduce this issue by following steps in _ubuntu_ guest
>>>>> 1.  in source side, choose memtest in grub
>>>>> 2. do live migration
>>>>> 3. exit memtest(type Esc in when memory testing)
>>>>> 4. wait migration complete
>>>>>
>>>>
>>>> Yes???it is a thorny problem. It is indeed easy to reproduce, just as
>>>> your steps in the above.
>>>>
>>>> This is my test result: (I also test accel=tcg, it can be reproduced also.)
>>>> Source side:
>>>> # x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.3,accel=kvm,usb=off -no-user-config -nodefaults  -cpu qemu64,-kvmclock -boot c -drive file=/mnt/sdb/pure_IMG/ubuntu/ubuntu_14.04_server_64_2U_raw -device cirrus-vga,id=video0,vgamem_mb=8 -vnc :7 -m 2048 -smp 2 -monitor stdio
>>>> (qemu) ACPI_BUILD: init ACPI tables
>>>> ACPI_BUILD: init ACPI tables
>>>> migrate tcp:9.61.1.8:3004
>>>> ACPI_BUILD: init ACPI tables
>>>> before cpu_synchronize_all_states
>>>> 5a8f72d66732cac80d6a0d5713654c0e
>>>> md_host : before saving ram complete
>>>> 5a8f72d66732cac80d6a0d5713654c0e
>>>> md_host : after saving ram complete
>>>> 5a8f72d66732cac80d6a0d5713654c0e
>>>> (qemu)
>>>>
>>>> Destination side:
>>>> # x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.3,accel=kvm,usb=off -no-user-config -nodefaults  -cpu qemu64,-kvmclock -boot c -drive file=/mnt/sdb/pure_IMG/ubuntu/ubuntu_14.04_server_64_2U_raw -device cirrus-vga,id=video0,vgamem_mb=8 -vnc :7 -m 2048 -smp 2 -monitor stdio -incoming tcp:0:3004
>>>> (qemu) QEMU_VM_SECTION_END, after loading ram
>>>> d7cb0d8a4bdd1557fb0e78baee50c986
>>>> md_host : after loading all vmstate
>>>> d7cb0d8a4bdd1557fb0e78baee50c986
>>>> md_host : after cpu_synchronize_all_post_init
>>>> d7cb0d8a4bdd1557fb0e78baee50c986
>>>
>>> Hmm, that's not good.  I suggest you md5 each of the RAMBlock's individually;
>>> to see if it's main RAM that's different or something more subtle like
>>> video RAM.
>>>
>>
>> Er, all my previous tests are md5 'pc.ram' block only.
>>
>>> But then maybe it's easier just to dump the whole of RAM to file
>>> and byte compare it (hexdump the two dumps and diff ?)
>>
>> Hmm, we also used memcmp function to compare every page, but the addresses
>> seem to be random.
>>
>> Besides, in our previous test, we found it seems to be more easy to reproduce
>> when migration occurs during VM's start-up or reboot process.
>>
>> Is there any possible that some devices have special treatment when VM start-up
>> which may miss setting dirty-bitmap ?
>
> I don't think there should be, but the code paths used during startup are
> probably much less tested with migration.  I'm sure the startup code
> uses different part of device emulation.   I do know we have some bugs

Er, Maybe there is a special case:

During VM's start-up, i found that the KVMslot changed many times, it was a process of
smashing total memory space into small slot.

If some pages was dirtied and its dirty-bitmap has been set in KVM module,
but we didn't sync the bitmaps to QEMU user-space before this slot was smashed,
with its previous bitmap been destroyed.
The bitmap of dirty pages in the previous KVMslot maybe be missed.

What's your opinion? Can this situation i described in the above happen?

The bellow log was grabbed, when i tried to figure out a quite same question (some pages miss dirty-bitmap setting) we found in COLO:
Occasionally, there will be an error report in SLAVE side:

     qemu: warning: error while loading state for instance 0x0 of device
     'kvm-tpr-opt'                                                 '
     qemu-system-x86_64: loadvm failed

We found that it related to three address (gpa: 0xca000,0xcb000,0xcc000, which are the address of 'kvmvapic.rom ?'), and sometimes
their corresponding dirty-map will be missed in Master side, because their KVMSlot is destroyed before we sync its dirty-bitmap to qemu.

(I'm still not quite sure if this can also happen in common migration, i will try to test it in normal migration)

Thanks,
zhanghailiang

/var/log/message:
Mar 31 18:32:53 master kernel: [15908.524721] memslot dirty bitmap of '4' was destroyed, base_gfn 0x100, npages 524032
Mar 31 18:32:53 master kernel: [15908.657853] +++ gfn=0xcc is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.665105] +++ gfn=0xcb is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.672360] +++ gfn=0xca is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.682058] memslot dirty bitmap of '4' was destroyed, base_gfn 0xfebc0, npages 16
Mar 31 18:32:53 master kernel: [15908.849527] +++ gfn=0xca is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.856845] +++ gfn=0xcb is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.864161] +++ gfn=0xcc is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.872676] memslot dirty bitmap of '4' was destroyed, base_gfn 0xfeb80, npages 64
Mar 31 18:32:53 master kernel: [15908.882694] +++ gfn=0xca is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.889948] +++ gfn=0xcb is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.897202] +++ gfn=0xcc is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.911652] +++ gfn=0xca is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.919543] +++ gfn=0xca is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.927419] +++ gfn=0xcb is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.935317] +++ gfn=0xcb is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.943209] +++ gfn=0xcb is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.951083] +++ gfn=0xcb is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.958971] +++ gfn=0xcc is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.966837] +++ gfn=0xcc is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.974707] +++ gfn=0xcc is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.988470] memslot dirty bitmap of '2' was destroyed, base_gfn 0xc0, npages 524096
Mar 31 18:32:53 master kernel: [15909.002403] kvm: zapping shadow pages for mmio generation wraparound
Mar 31 18:32:53 master kernel: [15909.002523] memslot dirty bitmap of '2' was destroyed, base_gfn 0xc0, npages 8
Mar 31 18:32:53 master kernel: [15909.010110] memslot dirty bitmap of '4' was destroyed, base_gfn 0xc8, npages 524088
Mar 31 18:32:53 master kernel: [15909.023988] memslot dirty bitmap of '5' was destroyed, base_gfn 0xcd, npages 3
Mar 31 18:32:53 master kernel: [15909.031594] memslot dirty bitmap of '6' was destroyed, base_gfn 0xd0, npages 524080
Mar 31 18:32:53 master kernel: [15909.044708] memslot dirty bitmap of '5' was destroyed, base_gfn 0xcd, npages 11
Mar 31 18:32:53 master kernel: [15909.052392] memslot dirty bitmap of '6' was destroyed, base_gfn 0xd8, npages 524072
Mar 31 18:32:53 master kernel: [15909.065651] memslot dirty bitmap of '5' was destroyed, base_gfn 0xcd, npages 19
Mar 31 18:32:53 master kernel: [15909.073329] memslot dirty bitmap of '6' was destroyed, base_gfn 0xe0, npages 524064
Mar 31 18:32:53 master kernel: [15909.086379] memslot dirty bitmap of '5' was destroyed, base_gfn 0xcd, npages 27
Mar 31 18:32:53 master kernel: [15909.094084] memslot dirty bitmap of '6' was destroyed, base_gfn 0xe8, npages 524056
Mar 31 18:32:53 master kernel: [15909.107354] memslot dirty bitmap of '6' was destroyed, base_gfn 0xec, npages 524052
Mar 31 18:32:54 master dhcpcd[5408]: eth0: timed out
Mar 31 18:32:54 master dhcpcd[5408]: eth0: trying to use old lease in `/var/lib/dhcpcd/dhcpcd-eth0.info'
Mar 31 18:32:54 master dhcpcd[5408]: eth0: lease expired 30265 seconds ago
Mar 31 18:32:54 master dhcpcd[5408]: eth0: broadcasting for a lease
Mar 31 18:33:02 master qemu-system-x86_64: ====qemu: The 0 times do checkpoing===
Mar 31 18:33:02 master qemu-system-x86_64: == migration_bitmap_sync ==
Mar 31 18:33:02 master qemu-system-x86_64: qemu:  --- addr=0xca000 (hva=0x7f4fa32ca000), who's bitmap not set ---
Mar 31 18:33:02 master qemu-system-x86_64: qemu:  --- addr=0xcb000 (hva=0x7f4fa32cb000), who's bitmap not set ---
Mar 31 18:33:02 master qemu-system-x86_64: qemu:  --- addr=0xcc000 (hva=0x7f4fa32cc000), who's bitmap not set ---
Mar 31 18:33:05 master kernel: [15921.057246] device eth2 left promiscuous mode
Mar 31 18:33:07 master avahi-daemon[5773]: Withdrawing address record for fe80::90be:cfff:fe9e:f03e on tap0.
Mar 31 18:33:07 master kernel: [15922.513313] br0: port 2(tap0) entered disabled state
Mar 31 18:33:07 master kernel: [15922.513480] device tap0 left promiscuous mode
Mar 31 18:33:07 master kernel: [15922.513508] br0: port 2(tap0) entered disabled state
Mar 31 18:33:07 master kernel: [15922.562591] memslot dirty bitmap of '8' was destroyed, base_gfn 0x100, npages 524032
Mar 31 18:33:07 master kernel: [15922.570948] memslot dirty bitmap of '3' was destroyed, base_gfn 0xfd000, npages 4096
Mar 31 18:33:07 master kernel: [15922.578967] memslot dirty bitmap of '0' was destroyed, base_gfn 0x0, npages 160
Mar 31 18:33:07 master kernel: [15922.586556] memslot dirty bitmap of '1' was destroyed, base_gfn 0xfffc0, npages 64
Mar 31 18:33:07 master kernel: [15922.586574] memslot dirty bitmap of '5' was destroyed, base_gfn 0xcd, npages 31
Mar 31 18:33:07 master kernel: [15922.586575] memslot dirty bitmap of '7' was destroyed, base_gfn 0xf0, npages 16
Mar 31 18:33:07 master kernel: [15922.586577] memslot dirty bitmap of '2' was destroyed, base_gfn 0xc0, npages 10
Mar 31 18:33:07 master kernel: [15922.586578] memslot dirty bitmap of '6' was destroyed, base_gfn 0xec, npages 4
Mar 31 18:33:07 master kernel: [15922.586579] memslot dirty bitmap of '4' was destroyed, base_gfn 0xca, npages 3

PS:
QEMU:
static void migration_bitmap_sync(void)
{
     trace_migration_bitmap_sync_end(migration_dirty_pages
                                     - num_dirty_pages_init);
     num_dirty_pages_period += migration_dirty_pages - num_dirty_pages_init;
     end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+    syslog(LOG_INFO, "== migration_bitmap_sync ==");
+    check_bitmap(();

+void check_bitmap(void)
+{
+    int i;
+    char *host;
+    ram_addr_t addr[3] = { 0xca000, 0xcb000, 0xcc000 };
+    RAMBlock *block = NULL;
+    int ret;
+
+    block = QLIST_FIRST_RCU(&ram_list.blocks);
+    for (i = 0; i < 3; i++) {
+        unsigned long nr = block->mr->ram_addr + (addr[i] >> TARGET_PAGE_BITS);
+        host =  memory_region_get_ram_ptr(block->mr) + addr[i];
+        ret = test_bit(nr, migration_bitmap);
+        if (ret == 0) {
+            syslog(LOG_INFO, "qemu:  --- addr=0x%llx (hva=%p), who's bitmap not set ---\n",
+                       addr[i], host);
+        } else {
+            syslog(LOG_INFO, "qemu: +++ OK, addr=0x%llx (hva=%p) , who's bitap is set +++\n",
+                       addr[i], host);;
+        }
+    }
+}

KVM：
static void mark_page_dirty_in_slot(struct kvm *kvm,
                     struct kvm_memory_slot *memslot,
                     gfn_t gfn)
{
+    if ((gfn == 0xca || gfn == 0xcb || gfn == 0xcc) && !memslot->dirty_bitmap) {
+        printk("oops, dirty_bitmap is null, gfn=0x%llx\n",gfn);
+    }
     if (memslot && memslot->dirty_bitmap) {
         unsigned long rel_gfn = gfn - memslot->base_gfn;
+        if (gfn == 0xca || gfn == 0xcb || gfn == 0xcc) {
+            printk("+++ gfn=0x%llx is marked dirty, id=%d, base_gfn=0x%llx, npages=%d\n",
+                   gfn, memslot->id, memslot->base_gfn, memslot->npages);
+        }
         set_bit_le(rel_gfn, memslot->dirty_bitmap);
     }
}

static void kvm_destroy_dirty_bitmap(struct kvm_memory_slot *memslot)
{
     if (!memslot->dirty_bitmap)
         return;
+    if (memslot->id != 9) {
+        printk("memslot dirty bitmap of '%d' was destroyed, base_gfn 0x%llx, npages %lld\n",
+           memslot->id, memslot->base_gfn, memslot->npages);
+    }
     kvm_kvfree(memslot->dirty_bitmap);
     memslot->dirty_bitmap = NULL;
}


> filed against migration during windows boot, I'd not considered that it might
> be devices not updating the bitmap.
>
> Dave
>
>>
>>
>> Thanks,
>> zhanghailiang
>>
>>
>>>>>>
>>>>>>> What kind of load were you having when reproducing this issue?
>>>>>>> Just to confirm, you have been able to reproduce this without COLO
>>>>>>> patches, right?
>>>>>>>
>>>>>>>> (qemu) migrate tcp:192.168.3.8:3004
>>>>>>>> before saving ram complete
>>>>>>>> ff703f6889ab8701e4e040872d079a28
>>>>>>>> md_host : after saving ram complete
>>>>>>>> ff703f6889ab8701e4e040872d079a28
>>>>>>>>
>>>>>>>> DST: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -netdev tap,id=hn0,vhost=on -device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -incoming tcp:0:3004
>>>>>>>> (qemu) QEMU_VM_SECTION_END, after loading ram
>>>>>>>> 230e1e68ece9cd4e769630e1bcb5ddfb
>>>>>>>> md_host : after loading all vmstate
>>>>>>>> 230e1e68ece9cd4e769630e1bcb5ddfb
>>>>>>>> md_host : after cpu_synchronize_all_post_init
>>>>>>>> 230e1e68ece9cd4e769630e1bcb5ddfb
>>>>>>>>
>>>>>>>> This happens occasionally, and it is more easy to reproduce when issue migration command during VM's startup time.
>>>>>>> OK, a couple of things.  Memory don't have to be exactly identical.
>>>>>>> Virtio devices in particular do funny things on "post-load".  There
>>>>>>> aren't warantees for that as far as I know, we should end with an
>>>>>>> equivalent device state in memory.
>>>>>>>
>>>>>>>> We have done further test and found that some pages has been dirtied but its corresponding migration_bitmap is not set.
>>>>>>>> We can't figure out which modules of QEMU has missed setting bitmap when dirty page of VM,
>>>>>>>> it is very difficult for us to trace all the actions of dirtying VM's pages.
>>>>>>> This seems to point to a bug in one of the devices.
>>>>>>>
>>>>>>>> Actually, the first time we found this problem was in the COLO FT development, and it triggered some strange issues in
>>>>>>>> VM which all pointed to the issue of inconsistent of VM's memory. (We have try to save all memory of VM to slave side every time
>>>>>>>> when do checkpoint in COLO FT, and everything will be OK.)
>>>>>>>>
>>>>>>>> Is it OK for some pages that not transferred to destination when do migration ? Or is it a bug?
>>>>>>> Pages transferred should be the same, after device state transmission is
>>>>>>> when things could change.
>>>>>>>
>>>>>>>> This issue has blocked our COLO development... :(
>>>>>>>>
>>>>>>>> Any help will be greatly appreciated!
>>>>>>> Later, Juan.
>>>>>>>
>>>>>> .
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>
>>> .
>>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>