From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:33086) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZX4Ay-0002p8-NC for qemu-devel@nongnu.org; Wed, 02 Sep 2015 05:17:54 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZX4Au-0000PM-JV for qemu-devel@nongnu.org; Wed, 02 Sep 2015 05:17:52 -0400 Received: from szxga01-in.huawei.com ([58.251.152.64]:18252) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZX4At-0000O4-GF for qemu-devel@nongnu.org; Wed, 02 Sep 2015 05:17:48 -0400 References: <1441182199-8328-1-git-send-email-zhang.zhanghailiang@huawei.com> <55E6BB69.90409@cn.fujitsu.com> From: zhanghailiang Message-ID: <55E6BE9A.2010408@huawei.com> Date: Wed, 2 Sep 2015 17:17:14 +0800 MIME-Version: 1.0 In-Reply-To: <55E6BB69.90409@cn.fujitsu.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH COLO-Frame v9 00/32] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Yang Hongyang , qemu-devel@nongnu.org, stefanha@redhat.com Cc: lizhijian@cn.fujitsu.com, quintela@redhat.com, yunhong.jiang@intel.com, eddie.dong@intel.com, peter.huangpeng@huawei.com, dgilbert@redhat.com, arei.gonglei@huawei.com, amit.shah@redhat.com On 2015/9/2 17:03, Yang Hongyang wrote: > Hi Stefan, > > As we discussed at KVM Forum, a stripped down version of COLO is realized, > which only contains periodic checkpoint mode. By the advantage of QEMU space > netbuffer(we can drop all those complex network settings), this version is more > easy to setup and test, all you need is a pair of machine, both checkout > https://github.com/coloft/qemu/commits/colo-v2.0-periodic-mode > and compile. Issue the test procedure mentioned below. It's done. > Er, sorry, i have lost one step for the test. In Slave side, before startup qemu, we should create the ramdisk for the disk. # qemu-img create -f qcow2 /mnt/ramfs/active_disk.img 10G # qemu-img create -f qcow2 /mnt/ramfs/hidden_disk.img 10G Please note, the size should be same with VM's disk. Thanks. zhanghailiang > Thanks! > > On 09/02/2015 04:22 PM, zhanghailiang wrote: >> This is the 9th version of COLO. >> >> Please Note that, this version is very different from the previous versions. >> since we have decided to realize proxy in qemu, which based on slirp in qemu. >> We dropped all the original colo proxy related part. >> >> It will be a long time for proxy to be ready for merging, so here we extract >> the basic periodic checkpoint part that not depend on proxy into this series. >> Actually, the 'periodic' mode is also what we want to support in COLO, it is >> based on Yang Hongyang's netfilter series. and this mode is very like >> MicroCheckpointing and Remus. >> >> You can find the discussion about why & how to realize the colo proxy in qemu >> from the follow link: >> http://lists.nongnu.org/archive/html/qemu-devel/2015-07/msg04069.html >> >> As usual, here is only COLO frame part, you can get the whole codes from github: >> https://github.com/coloft/qemu/commits/colo-v2.0-periodic-mode >> >> Compared with previous versions, this version is more easy to test. >> >> Test procedure: >> 1. Startup qemu >> Primary side: >> # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -netdev tap,id=bn0 -netfilter buffer,id=f0,netdev=bn0,chain=in -device virtio-net-pci,id=net-pci0,netdev=bn0 -boot c -drive if=virtio,id=disk1,driver=quorum,read-pattern=fifo,cache=none,aio=native,children.0.file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,children.0.driver=raw -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -S >> >> Secondary side: >> # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0 -drive if=none,driver=raw,file=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,id=colo1,cache=none,aio=native -drive if=virtio,driver=replication,mode=secondary,throttling.bps-total=70000000,file.file.filename=/mnt/ramfs/active_disk.img,file.driver=qcow2,file.backing.file.filename=/mnt/ramfs/hidden_disk.img,file.backing.driver=qcow2,file.backing.backing.backing_reference=colo1,file.backing.allow-write-backing-file=on -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-table -monitor stdio -incoming tcp:0:8888 >> >> 2. On Secondary VM's QEMU monitor, issue command >> (qemu) nbd_server_start 192.168.2.88:8889 >> (qemu) nbd_server_add -w colo1 >> >> 3. On Primary VM's QEMU monitor, issue command: >> (qemu) child_add disk1 child.driver=replication,child.mode=primary,child.file.host=192.168.2.88,child.file.port=8889,child.file.export=colo1,child.file.driver=nbd,child.ignore-errors=on >> (qemu) migrate_set_capability colo on >> (qemu) migrate tcp:192.168.2.88:8888 >> >> 4. After the above steps, you will see, whenever you make changes to PVM, SVM will be synced. >> You can by issue command "migrate_set_parameter checkpoint-delay 2000" >> to change the checkpoint period time. >> >> 5. Failover test >> You can kill PVM and run 'colo_lost_heartbeat' in SVM's >> monitor at the same time, then SVM will failover and client will not feel this change. >> >> COLO is a totally new feature which is still in early stage, >> your comments and feedback are warmly welcomed. >> >> TODO: >> 1. checkpoint based on proxy in qemu >> 2. The capability of continuous FT >> >> v9: >> - Drop colo proxy related part (colo-nic.c file) >> - Convert COLO protocol name definition to QAPI >> - Smash failover related patch (patch 19/20/23) >> - Fix colo exit event according Eric's comments. >> - Fix some typos from Eric's comments >> - Fix bug 'invalid runstate transition: 'colo' -> 'prelaunch' reported >> by Dave (patch 27) >> - Use migrate_set_parameter intead of ecolo-set-checkpoint-period to set >> checkpoint delay time (patch 25) >> - Add new patch (patch 29/30) to seperate the process of saving/loading >> device and state during checkpoint. which will reduce the data size >> for sending and also reduce the qsb size used in checkpoint. >> >> Wen Congyang (1): >> COLO: Add block replication into colo process >> >> zhanghailiang (31): >> configure: Add parameter for configure to enable/disable COLO support >> migration: Introduce capability 'colo' to migration >> COLO: migrate colo related info to slave >> migration: Add state records for migration incoming >> migration: Integrate COLO checkpoint process into migration >> migration: Integrate COLO checkpoint process into loadvm >> migration: Rename the'file' member of MigrationState and >> MigrationIncomingState >> COLO/migration: establish a new communication path from destination to >> source >> COLO: Implement colo checkpoint protocol >> COLO: Add a new RunState RUN_STATE_COLO >> QEMUSizedBuffer: Introduce two help functions for qsb >> COLO: Save PVM state to secondary side when do checkpoint >> COLO: Load PVM's dirty pages into SVM's RAM cache temporarily >> COLO: Load VMState into qsb before restore it >> COLO: Flush PVM's cached RAM into SVM's memory >> COLO: synchronize PVM's state to SVM periodically >> COLO failover: Introduce a new command to trigger a failover >> COLO failover: Introduce state to record failover process >> COLO: Implement failover work for Primary VM >> COLO: Implement failover work for Secondary VM >> COLO: implement default failover treatment >> qmp event: Add event notification for COLO error >> COLO failover: Shutdown related socket fd when do failover >> COLO failover: Don't do failover during loading VM's state >> COLO: Control the checkpoint delay time by migrate-set-parameters >> command >> COLO: Implement shutdown checkpoint >> COLO: Update the global runstate after going into colo state >> savevm: Split load vm state function qemu_loadvm_state >> COLO: Separate the process of saving/loading ram and device state >> COLO: Split qemu_savevm_state_begin out of checkpoint process >> COLO: Add net packets treatment into COLO >> >> configure | 11 + >> docs/qmp/qmp-events.txt | 17 + >> hmp-commands.hx | 15 + >> hmp.c | 16 + >> hmp.h | 1 + >> include/exec/cpu-all.h | 1 + >> include/migration/colo.h | 44 +++ >> include/migration/failover.h | 33 ++ >> include/migration/migration.h | 16 +- >> include/migration/qemu-file.h | 3 +- >> include/sysemu/sysemu.h | 8 + >> migration/Makefile.objs | 2 + >> migration/colo-comm.c | 75 ++++ >> migration/colo-failover.c | 83 +++++ >> migration/colo.c | 782 ++++++++++++++++++++++++++++++++++++++++++ >> migration/exec.c | 4 +- >> migration/fd.c | 4 +- >> migration/migration.c | 184 +++++++--- >> migration/qemu-file-buf.c | 58 ++++ >> migration/ram.c | 185 +++++++++- >> migration/savevm.c | 309 +++++++++++++---- >> migration/tcp.c | 4 +- >> migration/unix.c | 4 +- >> qapi-schema.json | 101 +++++- >> qapi/event.json | 17 + >> qmp-commands.hx | 20 ++ >> stubs/Makefile.objs | 1 + >> stubs/migration-colo.c | 45 +++ >> trace-events | 8 + >> vl.c | 37 +- >> 30 files changed, 1930 insertions(+), 158 deletions(-) >> create mode 100644 include/migration/colo.h >> create mode 100644 include/migration/failover.h >> create mode 100644 migration/colo-comm.c >> create mode 100644 migration/colo-failover.c >> create mode 100644 migration/colo.c >> create mode 100644 stubs/migration-colo.c >> >