From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55526) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XDFIH-0008S2-FA for qemu-devel@nongnu.org; Fri, 01 Aug 2014 12:03:01 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XDFID-0002Qb-ED for qemu-devel@nongnu.org; Fri, 01 Aug 2014 12:02:57 -0400 Received: from mx1.redhat.com ([209.132.183.28]:22305) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XDFID-0002QO-6h for qemu-devel@nongnu.org; Fri, 01 Aug 2014 12:02:53 -0400 Date: Fri, 1 Aug 2014 17:02:42 +0100 From: "Dr. David Alan Gilbert" Message-ID: <20140801160242.GI2430@work-vm> References: <1406125538-27992-1-git-send-email-yanghy@cn.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1406125538-27992-1-git-send-email-yanghy@cn.fujitsu.com> Subject: Re: [Qemu-devel] [RFC PATCH 00/17] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Yang Hongyang Cc: kvm@vger.kernel.org, GuiJianfeng@cn.fujitsu.com, eddie.dong@intel.com, qemu-devel@nongnu.org, mrhines@linux.vnet.ibm.com * Yang Hongyang (yanghy@cn.fujitsu.com) wrote: > Virtual machine (VM) replication is a well known technique for > providing application-agnostic software-implemented hardware fault > tolerance "non-stop service". COLO is a high availability solution. > Both primary VM (PVM) and secondary VM (SVM) run in parallel. They > receive the same request from client, and generate response in parallel > too. If the response packets from PVM and SVM are identical, they are > released immediately. Otherwise, a VM checkpoint (on demand) is > conducted. The idea is presented in Xen summit 2012, and 2013, > and academia paper in SOCC 2013. It's also presented in KVM forum > 2013: > http://www.linux-kvm.org/wiki/images/1/1d/Kvm-forum-2013-COLO.pdf > Please refer to above document for detailed information. > Please also refer to previous posted RFC proposal: > http://lists.nongnu.org/archive/html/qemu-devel/2014-06/msg05567.html Hi Yang, Thanks for this set of patches (and I've replied to many individually). > The patchset is also hosted on github: > https://github.com/macrosheep/qemu/tree/colo_v0.1 > > This patchset is RFC, implements the frame of colo, without > failover and nic/disk replication. But it is ready for demo > the COLO idea above QEMU-Kvm. > Steps using this patchset to get an overview of COLO: > 1. configure the source with --enable-colo option > 2. compile > 3. just like QEMU's normal migration, run 2 QEMU VM: > - Primary VM > - Secondary VM with -incoming tcp:[IP]:[PORT] option > 4. on Primary VM's QEMU monitor, run following command: > migrate_set_capability colo on > migrate tcp:[IP]:[PORT] > 5. done > you will see two runing VMs, whenever you make changes to PVM, SVM > will be synced to PVM's state. > > TODO list: > 1. failover > 2. nic replication > 3. disk replication[COLO Disk manager] I wonder if there are any parts that can be borrowed from other code to get it going; I notice that the reverse execution patchset has a network packet record/replay mode: https://lists.gnu.org/archive/html/qemu-devel/2014-07/msg00157.html What was used for the nic comparison in the 2013 kvm forum paper? Dave > > Any comments/feedbacks are warmly welcomed. > > Thanks, > Yang > > Yang Hongyang (17): > configure: add CONFIG_COLO to switch COLO support > COLO: introduce an api colo_supported() to indicate COLO support > COLO migration: add a migration capability 'colo' > COLO info: use colo info to tell migration target colo is enabled > COLO save: integrate COLO checkpointed save into qemu migration > COLO restore: integrate COLO checkpointed restore into qemu restore > COLO buffer: implement colo buffer as well as QEMUFileOps based on it > COLO: disable qdev hotplug > COLO ctl: implement API's that communicate with colo agent > COLO ctl: introduce is_slave() and is_master() > COLO ctl: implement colo checkpoint protocol > COLO ctl: add a RunState RUN_STATE_COLO > COLO ctl: implement colo save > COLO ctl: implement colo restore > COLO save: reuse migration bitmap under colo checkpoint > COLO ram cache: implement colo ram cache on slaver > HACK: trigger checkpoint every 500ms > > Makefile.objs | 2 + > arch_init.c | 174 +++++++++- > configure | 14 + > include/exec/cpu-all.h | 1 + > include/migration/migration-colo.h | 36 +++ > include/migration/migration.h | 13 + > include/qapi/qmp/qerror.h | 3 + > migration-colo-comm.c | 78 +++++ > migration-colo.c | 643 +++++++++++++++++++++++++++++++++++++ > migration.c | 45 ++- > qapi-schema.json | 9 +- > stubs/Makefile.objs | 1 + > stubs/migration-colo.c | 34 ++ > vl.c | 12 + > 14 files changed, 1044 insertions(+), 21 deletions(-) > create mode 100644 include/migration/migration-colo.h > create mode 100644 migration-colo-comm.c > create mode 100644 migration-colo.c > create mode 100644 stubs/migration-colo.c > > -- > 1.9.1 > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK