qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Li Zhijian <lizhijian@cn.fujitsu.com>
Cc: zhanghailiang <zhang.zhanghailiang@huawei.com>,
	Juan Quintela <quintela@redhat.com>,
	qemu-devel@nongnu.org, Amit Shah <amit.shah@redhat.com>,
	david@gibson.dropbear.id.au
Subject: Re: [Qemu-devel] [TCG only][Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration
Date: Thu, 3 Dec 2015 11:22:27 +0000	[thread overview]
Message-ID: <20151203112226.GE2591@work-vm> (raw)
In-Reply-To: <56601806.1010805@cn.fujitsu.com>

* Li Zhijian (lizhijian@cn.fujitsu.com) wrote:
> 
> 
> On 12/03/2015 05:24 PM, Dr. David Alan Gilbert wrote:
> >* Li Zhijian (lizhijian@cn.fujitsu.com) wrote:
> >>Hi all,
> >>
> >>Does anyboday remember the similar issue post by hailiang months ago
> >>  http://patchwork.ozlabs.org/patch/454322/
> >>At least tow bugs about migration had been fixed since that.
> >
> >Yes, I wondered what happened to that.
> >
> >>And now we found the same issue at the tcg vm(kvm is fine), after migration,
> >>the content VM's memory is inconsistent.
> >
> >Hmm, TCG only - I don't know much about that; but I guess something must
> >be accessing memory without using the proper macros/functions so
> >it doesn't mark it as dirty.
> >
> >>we add a patch to check memory content, you can find it from affix
> >>
> >>steps to reporduce:
> >>1) apply the patch and re-build qemu
> >>2) prepare the ubuntu guest and run memtest in grub.
> >>soruce side:
> >>x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
> >>e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
> >>if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
> >>virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
> >>-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
> >>tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
> >>pc-i440fx-2.3,accel=tcg,usb=off
> >>
> >>destination side:
> >>x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
> >>e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
> >>if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
> >>virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
> >>-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
> >>tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
> >>pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881
> >>
> >>3) start migration
> >>with 1000M NIC, migration will finish within 3 min.
> >>
> >>at source:
> >>(qemu) migrate tcp:192.168.2.66:8881
> >>after saving ram complete
> >>e9e725df678d392b1a83b3a917f332bb
> >>qemu-system-x86_64: end ram md5
> >>(qemu)
> >>
> >>at destination:
> >>...skip...
> >>Completed load of VM with exit code 0 seq iteration 1264
> >>Completed load of VM with exit code 0 seq iteration 1265
> >>Completed load of VM with exit code 0 seq iteration 1266
> >>qemu-system-x86_64: after loading state section id 2(ram)
> >>49c2dac7bde0e5e22db7280dcb3824f9
> >>qemu-system-x86_64: end ram md5
> >>qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init
> >>
> >>49c2dac7bde0e5e22db7280dcb3824f9
> >>qemu-system-x86_64: end ram md5
> >>
> >>This occurs occasionally and only at tcg machine. It seems that
> >>some pages dirtied in source side don't transferred to destination.
> >>This problem can be reproduced even if we disable virtio.
> >>
> >>Is it OK for some pages that not transferred to destination when do
> >>migration ? Or is it a bug?
> >
> >I'm pretty sure that means it's a bug.  Hard to find though, I guess
> >at least memtest is smaller than a big OS.  I think I'd dump the whole
> >of memory on both sides, hexdump and diff them  - I'd guess it would
> >just be one byte/word different, maybe that would offer some idea what
> >wrote it.
> 
> I try to dump and compare them, more than 10 pages are different.
> in source side, they are random value rather than always 'FF' 'FB' 'EF'
> 'BF'... in destination.
> 
> and not all of the different pages are continuous.

I wonder if it happens on all of memtest's different test patterns,
perhaps it might be possible to narrow it down if you tell memtest
to only run one test at a time.

Dave

> 
> thanks
> Li
> 
> 
> >
> >Dave
> >
> >>Any idea...
> >>
> >>=================md5 check patch=============================
> >>
> >>diff --git a/Makefile.target b/Makefile.target
> >>index 962d004..e2cb8e9 100644
> >>--- a/Makefile.target
> >>+++ b/Makefile.target
> >>@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o
> >>  obj-y += memory_mapping.o
> >>  obj-y += dump.o
> >>  obj-y += migration/ram.o migration/savevm.o
> >>-LIBS := $(libs_softmmu) $(LIBS)
> >>+LIBS := $(libs_softmmu) $(LIBS) -lplumb
> >>
> >>  # xen support
> >>  obj-$(CONFIG_XEN) += xen-common.o
> >>diff --git a/migration/ram.c b/migration/ram.c
> >>index 1eb155a..3b7a09d 100644
> >>--- a/migration/ram.c
> >>+++ b/migration/ram.c
> >>@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int
> >>version_id)
> >>      }
> >>
> >>      rcu_read_unlock();
> >>-    DPRINTF("Completed load of VM with exit code %d seq iteration "
> >>+    fprintf(stderr, "Completed load of VM with exit code %d seq iteration "
> >>              "%" PRIu64 "\n", ret, seq_iter);
> >>      return ret;
> >>  }
> >>diff --git a/migration/savevm.c b/migration/savevm.c
> >>index 0ad1b93..3feaa61 100644
> >>--- a/migration/savevm.c
> >>+++ b/migration/savevm.c
> >>@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f)
> >>
> >>  }
> >>
> >>+#include "exec/ram_addr.h"
> >>+#include "qemu/rcu_queue.h"
> >>+#include <clplumbing/md5.h>
> >>+#ifndef MD5_DIGEST_LENGTH
> >>+#define MD5_DIGEST_LENGTH 16
> >>+#endif
> >>+
> >>+static void check_host_md5(void)
> >>+{
> >>+    int i;
> >>+    unsigned char md[MD5_DIGEST_LENGTH];
> >>+    rcu_read_lock();
> >>+    RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check
> >>'pc.ram' block */
> >>+    rcu_read_unlock();
> >>+
> >>+    MD5(block->host, block->used_length, md);
> >>+    for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
> >>+        fprintf(stderr, "%02x", md[i]);
> >>+    }
> >>+    fprintf(stderr, "\n");
> >>+    error_report("end ram md5");
> >>+}
> >>+
> >>  void qemu_savevm_state_begin(QEMUFile *f,
> >>                               const MigrationParams *params)
> >>  {
> >>@@ -1056,6 +1079,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f,
> >>bool iterable_only)
> >>          save_section_header(f, se, QEMU_VM_SECTION_END);
> >>
> >>          ret = se->ops->save_live_complete_precopy(f, se->opaque);
> >>+
> >>+        fprintf(stderr, "after saving %s complete\n", se->idstr);
> >>+        check_host_md5();
> >>+
> >>          trace_savevm_section_end(se->idstr, se->section_id, ret);
> >>          save_section_footer(f, se);
> >>          if (ret < 0) {
> >>@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f,
> >>MigrationIncomingState *mis)
> >>                               section_id, le->se->idstr);
> >>                  return ret;
> >>              }
> >>+            if (section_type == QEMU_VM_SECTION_END) {
> >>+                error_report("after loading state section id %d(%s)",
> >>+                             section_id, le->se->idstr);
> >>+                check_host_md5();
> >>+            }
> >>              if (!check_section_footer(f, le)) {
> >>                  return -EINVAL;
> >>              }
> >>@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f)
> >>      }
> >>
> >>      cpu_synchronize_all_post_init();
> >>+    error_report("%s: after cpu_synchronize_all_post_init\n", __func__);
> >>+    check_host_md5();
> >>
> >>      return ret;
> >>  }
> >>
> >>
> >>
> >--
> >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> >
> >.
> >
> 
> -- 
> Best regards.
> Li Zhijian (8555)
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

  reply	other threads:[~2015-12-03 11:22 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-03  7:32 [Qemu-devel] [TCG only][Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration Li Zhijian
2015-12-03  9:24 ` Dr. David Alan Gilbert
2015-12-03  9:37   ` Hailiang Zhang
2015-12-03 10:23     ` Li Zhijian
2015-12-03 10:23   ` Li Zhijian
2015-12-03 11:22     ` Dr. David Alan Gilbert [this message]
2015-12-03 11:20 ` Juan Quintela
2015-12-04  1:43   ` Li, Liang Z
2015-12-17  6:07     ` Amit Shah

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151203112226.GE2591@work-vm \
    --to=dgilbert@redhat.com \
    --cc=amit.shah@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=lizhijian@cn.fujitsu.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=zhang.zhanghailiang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).