All of lore.kernel.org
 help / color / mirror / Atom feed
From: Li Zhijian <lizhijian@cn.fujitsu.com>
To: Hailiang Zhang <zhang.zhanghailiang@huawei.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Juan Quintela <quintela@redhat.com>,
	peter.huangpeng@huawei.com, qemu-devel@nongnu.org,
	Amit Shah <amit.shah@redhat.com>,
	david@gibson.dropbear.id.au
Subject: Re: [Qemu-devel] [TCG only][Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration
Date: Thu, 3 Dec 2015 18:23:58 +0800	[thread overview]
Message-ID: <5660183E.6050907@cn.fujitsu.com> (raw)
In-Reply-To: <56600D47.7050004@huawei.com>



On 12/03/2015 05:37 PM, Hailiang Zhang wrote:
> On 2015/12/3 17:24, Dr. David Alan Gilbert wrote:
>> * Li Zhijian (lizhijian@cn.fujitsu.com) wrote:
>>> Hi all,
>>>
>>> Does anyboday remember the similar issue post by hailiang months ago
>>>   http://patchwork.ozlabs.org/patch/454322/
>>> At least tow bugs about migration had been fixed since that.
>>
>> Yes, I wondered what happened to that.
>>
>>> And now we found the same issue at the tcg vm(kvm is fine), after
>>> migration,
>>> the content VM's memory is inconsistent.
>>
>> Hmm, TCG only - I don't know much about that; but I guess something must
>> be accessing memory without using the proper macros/functions so
>> it doesn't mark it as dirty.
>>
>>> we add a patch to check memory content, you can find it from affix
>>>
>>> steps to reporduce:
>>> 1) apply the patch and re-build qemu
>>> 2) prepare the ubuntu guest and run memtest in grub.
>>> soruce side:
>>> x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
>>> e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
>>> if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
>>> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
>>>
>>> -vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
>>> tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
>>> pc-i440fx-2.3,accel=tcg,usb=off
>>>
>>> destination side:
>>> x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
>>> e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
>>> if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
>>> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
>>>
>>> -vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
>>> tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
>>> pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881
>>>
>>> 3) start migration
>>> with 1000M NIC, migration will finish within 3 min.
>>>
>>> at source:
>>> (qemu) migrate tcp:192.168.2.66:8881
>>> after saving ram complete
>>> e9e725df678d392b1a83b3a917f332bb
>>> qemu-system-x86_64: end ram md5
>>> (qemu)
>>>
>>> at destination:
>>> ...skip...
>>> Completed load of VM with exit code 0 seq iteration 1264
>>> Completed load of VM with exit code 0 seq iteration 1265
>>> Completed load of VM with exit code 0 seq iteration 1266
>>> qemu-system-x86_64: after loading state section id 2(ram)
>>> 49c2dac7bde0e5e22db7280dcb3824f9
>>> qemu-system-x86_64: end ram md5
>>> qemu-system-x86_64: qemu_loadvm_state: after
>>> cpu_synchronize_all_post_init
>>>
>>> 49c2dac7bde0e5e22db7280dcb3824f9
>>> qemu-system-x86_64: end ram md5
>>>
>>> This occurs occasionally and only at tcg machine. It seems that
>>> some pages dirtied in source side don't transferred to destination.
>>> This problem can be reproduced even if we disable virtio.
>>>
>>> Is it OK for some pages that not transferred to destination when do
>>> migration ? Or is it a bug?
>>
>> I'm pretty sure that means it's a bug.  Hard to find though, I guess
>> at least memtest is smaller than a big OS.  I think I'd dump the whole
>> of memory on both sides, hexdump and diff them  - I'd guess it would
>> just be one byte/word different, maybe that would offer some idea what
>> wrote it.
>>
>
> Maybe one better way to do that is with the help of userfaultfd's
> write-protect
> capability. It is still in the development by Andrea Arcangeli, but there
> is a RFC version available, please refer to
> http://www.spinics.net/lists/linux-mm/msg97422.html
> (I'm developing live memory snapshot which based on it, maybe this is
> another scene where we
> can use userfaultfd's WP ;) ).

sounds good.

thanks
Li

>
>
>> Dave
>>
>>> Any idea...
>>>
>>> =================md5 check patch=============================
>>>
>>> diff --git a/Makefile.target b/Makefile.target
>>> index 962d004..e2cb8e9 100644
>>> --- a/Makefile.target
>>> +++ b/Makefile.target
>>> @@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o
>>>   obj-y += memory_mapping.o
>>>   obj-y += dump.o
>>>   obj-y += migration/ram.o migration/savevm.o
>>> -LIBS := $(libs_softmmu) $(LIBS)
>>> +LIBS := $(libs_softmmu) $(LIBS) -lplumb
>>>
>>>   # xen support
>>>   obj-$(CONFIG_XEN) += xen-common.o
>>> diff --git a/migration/ram.c b/migration/ram.c
>>> index 1eb155a..3b7a09d 100644
>>> --- a/migration/ram.c
>>> +++ b/migration/ram.c
>>> @@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int
>>> version_id)
>>>       }
>>>
>>>       rcu_read_unlock();
>>> -    DPRINTF("Completed load of VM with exit code %d seq iteration "
>>> +    fprintf(stderr, "Completed load of VM with exit code %d seq
>>> iteration "
>>>               "%" PRIu64 "\n", ret, seq_iter);
>>>       return ret;
>>>   }
>>> diff --git a/migration/savevm.c b/migration/savevm.c
>>> index 0ad1b93..3feaa61 100644
>>> --- a/migration/savevm.c
>>> +++ b/migration/savevm.c
>>> @@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f)
>>>
>>>   }
>>>
>>> +#include "exec/ram_addr.h"
>>> +#include "qemu/rcu_queue.h"
>>> +#include <clplumbing/md5.h>
>>> +#ifndef MD5_DIGEST_LENGTH
>>> +#define MD5_DIGEST_LENGTH 16
>>> +#endif
>>> +
>>> +static void check_host_md5(void)
>>> +{
>>> +    int i;
>>> +    unsigned char md[MD5_DIGEST_LENGTH];
>>> +    rcu_read_lock();
>>> +    RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check
>>> 'pc.ram' block */
>>> +    rcu_read_unlock();
>>> +
>>> +    MD5(block->host, block->used_length, md);
>>> +    for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
>>> +        fprintf(stderr, "%02x", md[i]);
>>> +    }
>>> +    fprintf(stderr, "\n");
>>> +    error_report("end ram md5");
>>> +}
>>> +
>>>   void qemu_savevm_state_begin(QEMUFile *f,
>>>                                const MigrationParams *params)
>>>   {
>>> @@ -1056,6 +1079,10 @@ void
>>> qemu_savevm_state_complete_precopy(QEMUFile *f,
>>> bool iterable_only)
>>>           save_section_header(f, se, QEMU_VM_SECTION_END);
>>>
>>>           ret = se->ops->save_live_complete_precopy(f, se->opaque);
>>> +
>>> +        fprintf(stderr, "after saving %s complete\n", se->idstr);
>>> +        check_host_md5();
>>> +
>>>           trace_savevm_section_end(se->idstr, se->section_id, ret);
>>>           save_section_footer(f, se);
>>>           if (ret < 0) {
>>> @@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f,
>>> MigrationIncomingState *mis)
>>>                                section_id, le->se->idstr);
>>>                   return ret;
>>>               }
>>> +            if (section_type == QEMU_VM_SECTION_END) {
>>> +                error_report("after loading state section id %d(%s)",
>>> +                             section_id, le->se->idstr);
>>> +                check_host_md5();
>>> +            }
>>>               if (!check_section_footer(f, le)) {
>>>                   return -EINVAL;
>>>               }
>>> @@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f)
>>>       }
>>>
>>>       cpu_synchronize_all_post_init();
>>> +    error_report("%s: after cpu_synchronize_all_post_init\n",
>>> __func__);
>>> +    check_host_md5();
>>>
>>>       return ret;
>>>   }
>>>
>>>
>>>
>> --
>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>
>> .
>>
>
>
>
>
> .
>

-- 
Best regards.
Li Zhijian (8555)

  reply	other threads:[~2015-12-03 10:24 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-03  7:32 [Qemu-devel] [TCG only][Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration Li Zhijian
2015-12-03  9:24 ` Dr. David Alan Gilbert
2015-12-03  9:37   ` Hailiang Zhang
2015-12-03 10:23     ` Li Zhijian [this message]
2015-12-03 10:23   ` Li Zhijian
2015-12-03 11:22     ` Dr. David Alan Gilbert
2015-12-03 11:20 ` Juan Quintela
2015-12-04  1:43   ` Li, Liang Z
2015-12-17  6:07     ` Amit Shah

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5660183E.6050907@cn.fujitsu.com \
    --to=lizhijian@cn.fujitsu.com \
    --cc=amit.shah@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=dgilbert@redhat.com \
    --cc=peter.huangpeng@huawei.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=zhang.zhanghailiang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.