From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:47468) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yssji-0000JB-25 for qemu-devel@nongnu.org; Thu, 14 May 2015 08:59:39 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Yssje-0007OS-OD for qemu-devel@nongnu.org; Thu, 14 May 2015 08:59:37 -0400 Received: from szxga03-in.huawei.com ([119.145.14.66]:12962) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yssjd-0007E8-P0 for qemu-devel@nongnu.org; Thu, 14 May 2015 08:59:34 -0400 Message-ID: <55549BFB.4030906@huawei.com> Date: Thu, 14 May 2015 20:58:35 +0800 From: zhanghailiang MIME-Version: 1.0 References: <1427347774-8960-1-git-send-email-zhang.zhanghailiang@huawei.com> <20150514121419.GE2576@work-vm> In-Reply-To: <20150514121419.GE2576@work-vm> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" Cc: lizhijian@cn.fujitsu.com, quintela@redhat.com, yunhong.jiang@intel.com, eddie.dong@intel.com, peter.huangpeng@huawei.com, qemu-devel@nongnu.org, arei.gonglei@huawei.com, amit.shah@redhat.com, david@gibson.dropbear.id.au On 2015/5/14 20:14, Dr. David Alan Gilbert wrote: > * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote: >> This is the 4th version of COLO, here is only COLO frame part, include: VM checkpoint, >> failover, proxy API, block replication API, not include block replication. >> The block part has been sent by wencongyang: >> [RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints >> >> Compared with last version, there aren't too much optimize and new functions. >> The main reason is that there is an known issue that still unsolved, we found >> some dirty pages which have been missed setting bit in corresponding bitmap. >> And it will trigger strange problem in VM. >> We hope to resolve it before add more codes. >> >> You can get the newest integrated qemu colo patches from github: >> https://github.com/coloft/qemu/commits/colo-v1.1 > > I thought I'd just say I've got the remotes/origin/colo_huawei_v4.7 off Wen's > git running here on one of my pair of machines. > > This set of hosts is mostly OK running colo; the proxy module still gets > upset sometimes; but mostly on errors; if the colo-proxy-script fails > during startup, I get RCU stalls; but if the script works, colo normally > works on this setup. Hi Dave, I'm trying to optimize the proxy module codes, and yes, the 'lock' we used in proxy module was a little messy before, and i have finished rewriting the part of communication between qemu and proxy, also removing the parameter of xt_PMYCOLO module ... There are still some further tests need to be done, i will finish it as soon as possible. I hope to send the next version in next week. And you can test the branch of Wen's temporarily, Except block part, there is no difference in COLO framework between his branch and mine. ;) Thanks, zhanghailiang > >> About how to test COLO, Please reference to the follow link. >> http://wiki.qemu.org/Features/COLO. >> >> Please review and test. >> >> Known issue still unsolved: >> (1) Some pages dirtied without setting its corresponding dirty-bitmap. >> >> Previous posted RFC patch series: >> http://lists.nongnu.org/archive/html/qemu-devel/2014-06/msg05567.html >> http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg04459.html >> https://lists.gnu.org/archive/html/qemu-devel/2015-02/msg04771.html >> >> TODO list: >> 1 Optimize the process of checkpoint, shorten the time-consuming: >> (Partly done, patch is not include into this series) >> 1) separate ram and device save/load process to reduce size of extra memory >> used during checkpoint >> 2) live migrate part of dirty pages to slave during sleep time. >> 2 Add more debug/stat info >> (Partly done, patch is not include into this series) >> include checkpoint count, proxy discompare count, downtime, >> number of live migrated pages, total sent pages, etc. >> 3 Strengthen failover >> 4 optimize proxy part, include proxy script. >> 5 The capability of continuous FT >> >> v4: >> - New block replication scheme (use image-fleecing for sencondary side) >> - Adress some comments from Eric Blake and Dave >> - Add commmand colo-set-checkpoint-period to set the time of periodic checkpoint >> - Add a delay (100ms) between continuous checkpoint requests to ensure VM >> run 100ms at least since last pause. >> >> v3: >> - use proxy instead of colo agent to compare network packets >> - add block replication >> - Optimize failover disposal >> - handle shutdown >> >> v2: >> - use QEMUSizedBuffer/QEMUFile as COLO buffer >> - colo support is enabled by default >> - add nic replication support >> - addressed comments from Eric Blake and Dr. David Alan Gilbert >> >> v1: >> - implement the frame of colo >> >> Wen Congyang (1): >> COLO: Add block replication into colo process >> >> zhanghailiang (27): >> configure: Add parameter for configure to enable/disable COLO support >> migration: Introduce capability 'colo' to migration >> COLO: migrate colo related info to slave >> migration: Integrate COLO checkpoint process into migration >> migration: Integrate COLO checkpoint process into loadvm >> COLO: Implement colo checkpoint protocol >> COLO: Add a new RunState RUN_STATE_COLO >> QEMUSizedBuffer: Introduce two help functions for qsb >> COLO: Save VM state to slave when do checkpoint >> COLO RAM: Load PVM's dirty page into SVM's RAM cache temporarily >> COLO VMstate: Load VM state into qsb before restore it >> arch_init: Start to trace dirty pages of SVM >> COLO RAM: Flush cached RAM into SVM's memory >> COLO failover: Introduce a new command to trigger a failover >> COLO failover: Implement COLO master/slave failover work >> COLO failover: Don't do failover during loading VM's state >> COLO: Add new command parameter 'colo_nicname' 'colo_script' for net >> COLO NIC: Init/remove colo nic devices when add/cleanup tap devices >> COLO NIC: Implement colo nic device interface configure() >> COLO NIC : Implement colo nic init/destroy function >> COLO NIC: Some init work related with proxy module >> COLO: Do checkpoint according to the result of net packets comparing >> COLO: Improve checkpoint efficiency by do additional periodic >> checkpoint >> COLO: Add colo-set-checkpoint-period command >> COLO NIC: Implement NIC checkpoint and failover >> COLO: Disable qdev hotplug when VM is in COLO mode >> COLO: Implement shutdown checkpoint >> >> arch_init.c | 199 +++++++- >> configure | 14 + >> hmp-commands.hx | 30 ++ >> hmp.c | 14 + >> hmp.h | 2 + >> include/exec/cpu-all.h | 1 + >> include/migration/migration-colo.h | 58 +++ >> include/migration/migration-failover.h | 22 + >> include/migration/migration.h | 3 + >> include/migration/qemu-file.h | 3 +- >> include/net/colo-nic.h | 25 + >> include/net/net.h | 4 + >> include/sysemu/sysemu.h | 3 + >> migration/Makefile.objs | 2 + >> migration/colo-comm.c | 80 ++++ >> migration/colo-failover.c | 48 ++ >> migration/colo.c | 809 +++++++++++++++++++++++++++++++++ >> migration/migration.c | 60 ++- >> migration/qemu-file-buf.c | 58 +++ >> net/Makefile.objs | 1 + >> net/colo-nic.c | 438 ++++++++++++++++++ >> net/tap.c | 45 +- >> qapi-schema.json | 42 +- >> qemu-options.hx | 10 +- >> qmp-commands.hx | 41 ++ >> savevm.c | 2 +- >> scripts/colo-proxy-script.sh | 97 ++++ >> stubs/Makefile.objs | 1 + >> stubs/migration-colo.c | 58 +++ >> vl.c | 36 +- >> 30 files changed, 2178 insertions(+), 28 deletions(-) >> create mode 100644 include/migration/migration-colo.h >> create mode 100644 include/migration/migration-failover.h >> create mode 100644 include/net/colo-nic.h >> create mode 100644 migration/colo-comm.c >> create mode 100644 migration/colo-failover.c >> create mode 100644 migration/colo.c >> create mode 100644 migration/colo.c. >> create mode 100644 net/colo-nic.c >> create mode 100755 scripts/colo-proxy-script.sh >> create mode 100644 stubs/migration-colo.c >> >> -- >> 1.7.12.4 >> >> > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > . >