From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38252) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZTstB-0003U1-CQ for qemu-devel@nongnu.org; Mon, 24 Aug 2015 10:38:22 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZTst9-0006SD-Ow for qemu-devel@nongnu.org; Mon, 24 Aug 2015 10:38:21 -0400 Received: from mx1.redhat.com ([209.132.183.28]:31090) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZTst9-0006Rd-H8 for qemu-devel@nongnu.org; Mon, 24 Aug 2015 10:38:19 -0400 Date: Mon, 24 Aug 2015 15:38:13 +0100 From: "Dr. David Alan Gilbert" Message-ID: <20150824143812.GH2370@work-vm> References: <1438159544-6224-1-git-send-email-zhang.zhanghailiang@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1438159544-6224-1-git-send-email-zhang.zhanghailiang@huawei.com> Subject: Re: [Qemu-devel] [PATCH COLO-Frame v8 00/34] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: zhanghailiang Cc: lizhijian@cn.fujitsu.com, quintela@redhat.com, yunhong.jiang@intel.com, eddie.dong@intel.com, peter.huangpeng@huawei.com, qemu-devel@nongnu.org, arei.gonglei@huawei.com, amit.shah@redhat.com * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote: > This is the 8th version of COLO. I'm seeing an occasional error: pcibus_reset: Assertion `bus->irq_count[i] == 0' failed. on the secondary; have you seen that? bus->irq_count[4] is -1 in my backtrace; it's colo_process_incoming_checkpoints->qemu_devices_reset->qbus_walk_children->qbus_reset_one->pcibus_reset Dave > Here is only COLO frame part, include: VM checkpoint, > failover, proxy API, block replication API, not include block replication. > The block part is treated as a separate series. > > As usual, we provide 'basic' and 'developing' branches in github: > https://github.com/coloft/qemu/commits/colo-v1.5-basic > https://github.com/coloft/qemu/commits/colo-v1.5-developing (more features) > > The 'basic' branch is exactly the same with this patch series, > We will keep this series simple as possible, just for easy review. > > The extra features in colo-v1.5-developing branch: > 1) Separate ram and device save/load process to reduce size of extra memory > used during checkpoint > 2) Live migrate part of dirty pages to slave during sleep time. > 3) You get the statistic info about checkpoint by command 'info migrate' > > Please reference to the follow link to test COLO. > http://wiki.qemu.org/Features/COLO. > > COLO is a totally new feature which is still in early stage, > your comments and feedback are warmly welcomed. > > NOTE: > We have decided to re-implement the colo proxy in userspace (In qemu exactly). > you can find the discussion about why & how to realize the colo proxy in qemu from the follow link: > http://lists.nongnu.org/archive/html/qemu-devel/2015-07/msg04069.html > > TODO: > 1. COLO function switch on/off > 2. The capability of continuous FT > 3. Optimize the performance. > > v8: > - Move some global variables into MigrationIncomingState and MigrationState > - Move some cleanup work form colo thread and colo incoming thread into failover > BH function and also fix the code logic for the cleanup work. > - fix the bug that colo thread and colo incoming thread possibly block in the > socket 'recv' call when do failover work. > - Optimize colo_flush_ram_cache() > - Add migration state for incoming side, we use the state to verify if migration > incoming side is in COLO state or not (Patch 5). > - Drop the patch 'COLO: Disable qdev hotplug when VM is in COLO mode', since it is not correct. > > zhanghailiang (34): > configure: Add parameter for configure to enable/disable COLO support > migration: Introduce capability 'colo' to migration > COLO: migrate colo related info to slave > colo-comm/migration: skip colo info section for special cases > migration: Add state records for migration incoming > migration: Integrate COLO checkpoint process into migration > migration: Integrate COLO checkpoint process into loadvm > COLO: Implement colo checkpoint protocol > COLO: Add a new RunState RUN_STATE_COLO > QEMUSizedBuffer: Introduce two help functions for qsb > COLO: Save VM state to slave when do checkpoint > COLO RAM: Load PVM's dirty page into SVM's RAM cache temporarily > COLO VMstate: Load VM state into qsb before restore it > arch_init: Start to trace dirty pages of SVM > COLO RAM: Flush cached RAM into SVM's memory > COLO failover: Introduce a new command to trigger a failover > COLO failover: Introduce state to record failover process > COLO failover: Implement COLO primary/secondary vm failover work > qmp event: Add event notification for COLO error > COLO failover: Don't do failover during loading VM's state > COLO: Add new command parameter 'forward_nic' 'colo_script' for net > COLO NIC: Init/remove colo nic devices when add/cleanup tap devices > tap: Make launch_script() public > COLO NIC: Implement colo nic device interface configure() > colo-nic: Handle secondary VM's original net device configure > COLO NIC: Implement colo nic init/destroy function > COLO NIC: Some init work related with proxy module > COLO: Handle nfnetlink message from proxy module > COLO: Do checkpoint according to the result of packets comparation > COLO: Improve checkpoint efficiency by do additional periodic > checkpoint > COLO: Add colo-set-checkpoint-period command > COLO NIC: Implement NIC checkpoint and failover > COLO: Implement shutdown checkpoint > COLO: Add block replication into colo process > > configure | 33 +- > docs/qmp/qmp-events.txt | 16 + > hmp-commands.hx | 30 ++ > hmp.c | 15 + > hmp.h | 2 + > include/exec/cpu-all.h | 1 + > include/migration/colo.h | 45 +++ > include/migration/failover.h | 33 ++ > include/migration/migration.h | 19 + > include/migration/qemu-file.h | 3 +- > include/net/colo-nic.h | 37 ++ > include/net/net.h | 2 + > include/net/tap.h | 19 + > include/sysemu/sysemu.h | 3 + > migration/Makefile.objs | 2 + > migration/colo-comm.c | 75 ++++ > migration/colo-failover.c | 83 +++++ > migration/colo.c | 805 ++++++++++++++++++++++++++++++++++++++++++ > migration/migration.c | 116 ++++-- > migration/qemu-file-buf.c | 58 +++ > migration/ram.c | 242 ++++++++++++- > migration/savevm.c | 2 +- > net/Makefile.objs | 1 + > net/colo-nic.c | 457 ++++++++++++++++++++++++ > net/net.c | 2 + > net/tap.c | 90 +++-- > qapi-schema.json | 58 ++- > qapi/event.json | 15 + > qemu-options.hx | 7 + > qmp-commands.hx | 42 +++ > scripts/colo-proxy-script.sh | 145 ++++++++ > stubs/Makefile.objs | 1 + > stubs/migration-colo.c | 58 +++ > trace-events | 10 + > vl.c | 37 +- > 35 files changed, 2474 insertions(+), 90 deletions(-) > create mode 100644 include/migration/colo.h > create mode 100644 include/migration/failover.h > create mode 100644 include/net/colo-nic.h > create mode 100644 migration/colo-comm.c > create mode 100644 migration/colo-failover.c > create mode 100644 migration/colo.c > create mode 100644 net/colo-nic.c > create mode 100755 scripts/colo-proxy-script.sh > create mode 100644 stubs/migration-colo.c > > -- > 1.8.3.1 > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK