From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36409) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZNIPe-0005t3-4O for qemu-devel@nongnu.org; Thu, 06 Aug 2015 06:28:40 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZNIPa-0004m5-6o for qemu-devel@nongnu.org; Thu, 06 Aug 2015 06:28:38 -0400 Received: from szxga01-in.huawei.com ([58.251.152.64]:23269) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZNIPZ-0004hE-Br for qemu-devel@nongnu.org; Thu, 06 Aug 2015 06:28:34 -0400 References: <1438159544-6224-1-git-send-email-zhang.zhanghailiang@huawei.com> <20150805112456.GF2331@work-vm> From: zhanghailiang Message-ID: <55C33602.2090801@huawei.com> Date: Thu, 6 Aug 2015 18:25:06 +0800 MIME-Version: 1.0 In-Reply-To: <20150805112456.GF2331@work-vm> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH COLO-Frame v8 00/34] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" Cc: lizhijian@cn.fujitsu.com, quintela@redhat.com, yunhong.jiang@intel.com, eddie.dong@intel.com, peter.huangpeng@huawei.com, qemu-devel@nongnu.org, arei.gonglei@huawei.com, amit.shah@redhat.com On 2015/8/5 19:24, Dr. David Alan Gilbert wrote: > * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote: >> This is the 8th version of COLO. >> >> Here is only COLO frame part, include: VM checkpoint, >> failover, proxy API, block replication API, not include block replication. >> The block part is treated as a separate series. >> >> As usual, we provide 'basic' and 'developing' branches in github: >> https://github.com/coloft/qemu/commits/colo-v1.5-basic >> https://github.com/coloft/qemu/commits/colo-v1.5-developing (more features) >> >> The 'basic' branch is exactly the same with this patch series, >> We will keep this series simple as possible, just for easy review. >> >> The extra features in colo-v1.5-developing branch: >> 1) Separate ram and device save/load process to reduce size of extra memory >> used during checkpoint >> 2) Live migrate part of dirty pages to slave during sleep time. >> 3) You get the statistic info about checkpoint by command 'info migrate' > > I'm hitting a problem that I think is due to the new global_state section > that Juan recently added; if I cause a failover I hit: > > ERROR: invalid runstate transition: 'colo' -> 'prelaunch' > > (on the secondary). > I think the problem is that, the global_state is only sent for any 'unusual' states, > so in the first migration that gets done at startup, 'prelaunch' is included in the stream > in the global state, but then for later checkpoints the global_state probably isn't > sent. > > I hacked around it by making global_state_needed return false; I guess > we need to find a better fix! > Yes, it is an known problem, i will look into it later, thanks. > > >> Please reference to the follow link to test COLO. >> http://wiki.qemu.org/Features/COLO. >> >> COLO is a totally new feature which is still in early stage, >> your comments and feedback are warmly welcomed. >> >> NOTE: >> We have decided to re-implement the colo proxy in userspace (In qemu exactly). >> you can find the discussion about why & how to realize the colo proxy in qemu from the follow link: >> http://lists.nongnu.org/archive/html/qemu-devel/2015-07/msg04069.html >> >> TODO: >> 1. COLO function switch on/off >> 2. The capability of continuous FT >> 3. Optimize the performance. >> >> v8: >> - Move some global variables into MigrationIncomingState and MigrationState >> - Move some cleanup work form colo thread and colo incoming thread into failover >> BH function and also fix the code logic for the cleanup work. >> - fix the bug that colo thread and colo incoming thread possibly block in the >> socket 'recv' call when do failover work. >> - Optimize colo_flush_ram_cache() >> - Add migration state for incoming side, we use the state to verify if migration >> incoming side is in COLO state or not (Patch 5). >> - Drop the patch 'COLO: Disable qdev hotplug when VM is in COLO mode', since it is not correct. >> >> zhanghailiang (34): >> configure: Add parameter for configure to enable/disable COLO support >> migration: Introduce capability 'colo' to migration >> COLO: migrate colo related info to slave >> colo-comm/migration: skip colo info section for special cases >> migration: Add state records for migration incoming >> migration: Integrate COLO checkpoint process into migration >> migration: Integrate COLO checkpoint process into loadvm >> COLO: Implement colo checkpoint protocol >> COLO: Add a new RunState RUN_STATE_COLO >> QEMUSizedBuffer: Introduce two help functions for qsb >> COLO: Save VM state to slave when do checkpoint >> COLO RAM: Load PVM's dirty page into SVM's RAM cache temporarily >> COLO VMstate: Load VM state into qsb before restore it >> arch_init: Start to trace dirty pages of SVM >> COLO RAM: Flush cached RAM into SVM's memory >> COLO failover: Introduce a new command to trigger a failover >> COLO failover: Introduce state to record failover process >> COLO failover: Implement COLO primary/secondary vm failover work >> qmp event: Add event notification for COLO error >> COLO failover: Don't do failover during loading VM's state >> COLO: Add new command parameter 'forward_nic' 'colo_script' for net >> COLO NIC: Init/remove colo nic devices when add/cleanup tap devices >> tap: Make launch_script() public >> COLO NIC: Implement colo nic device interface configure() >> colo-nic: Handle secondary VM's original net device configure >> COLO NIC: Implement colo nic init/destroy function >> COLO NIC: Some init work related with proxy module >> COLO: Handle nfnetlink message from proxy module >> COLO: Do checkpoint according to the result of packets comparation >> COLO: Improve checkpoint efficiency by do additional periodic >> checkpoint >> COLO: Add colo-set-checkpoint-period command >> COLO NIC: Implement NIC checkpoint and failover >> COLO: Implement shutdown checkpoint >> COLO: Add block replication into colo process >> >> configure | 33 +- >> docs/qmp/qmp-events.txt | 16 + >> hmp-commands.hx | 30 ++ >> hmp.c | 15 + >> hmp.h | 2 + >> include/exec/cpu-all.h | 1 + >> include/migration/colo.h | 45 +++ >> include/migration/failover.h | 33 ++ >> include/migration/migration.h | 19 + >> include/migration/qemu-file.h | 3 +- >> include/net/colo-nic.h | 37 ++ >> include/net/net.h | 2 + >> include/net/tap.h | 19 + >> include/sysemu/sysemu.h | 3 + >> migration/Makefile.objs | 2 + >> migration/colo-comm.c | 75 ++++ >> migration/colo-failover.c | 83 +++++ >> migration/colo.c | 805 ++++++++++++++++++++++++++++++++++++++++++ >> migration/migration.c | 116 ++++-- >> migration/qemu-file-buf.c | 58 +++ >> migration/ram.c | 242 ++++++++++++- >> migration/savevm.c | 2 +- >> net/Makefile.objs | 1 + >> net/colo-nic.c | 457 ++++++++++++++++++++++++ >> net/net.c | 2 + >> net/tap.c | 90 +++-- >> qapi-schema.json | 58 ++- >> qapi/event.json | 15 + >> qemu-options.hx | 7 + >> qmp-commands.hx | 42 +++ >> scripts/colo-proxy-script.sh | 145 ++++++++ >> stubs/Makefile.objs | 1 + >> stubs/migration-colo.c | 58 +++ >> trace-events | 10 + >> vl.c | 37 +- >> 35 files changed, 2474 insertions(+), 90 deletions(-) >> create mode 100644 include/migration/colo.h >> create mode 100644 include/migration/failover.h >> create mode 100644 include/net/colo-nic.h >> create mode 100644 migration/colo-comm.c >> create mode 100644 migration/colo-failover.c >> create mode 100644 migration/colo.c >> create mode 100644 net/colo-nic.c >> create mode 100755 scripts/colo-proxy-script.sh >> create mode 100644 stubs/migration-colo.c >> >> -- >> 1.8.3.1 >> >> > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > . >