From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58926) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aCMSt-0001hf-7S for qemu-devel@nongnu.org; Fri, 25 Dec 2015 02:07:04 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aCMSq-0006cu-1W for qemu-devel@nongnu.org; Fri, 25 Dec 2015 02:07:03 -0500 Received: from szxga03-in.huawei.com ([119.145.14.66]:2377) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aCMSp-0006RX-FH for qemu-devel@nongnu.org; Fri, 25 Dec 2015 02:06:59 -0500 References: <20151217203111.GG2484@work-vm> From: Hailiang Zhang Message-ID: <567CEAED.6090902@huawei.com> Date: Fri, 25 Dec 2015 15:06:21 +0800 MIME-Version: 1.0 In-Reply-To: <20151217203111.GG2484@work-vm> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [WIP] RDMA transport for COLO List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" Cc: lizhijian@cn.fujitsu.com, quintela@redhat.com, yunhong.jiang@intel.com, eddie.dong@intel.com, peter.huangpeng@huawei.com, qemu-devel@nongnu.org, arei.gonglei@huawei.com, luis@cs.umu.se, amit.shah@redhat.com, hongyang.yang@easystack.cn On 2015/12/18 4:31, Dr. David Alan Gilbert wrote: > Hi, > I've been playing with getting an RDMA setup for COLO and > have something that mostly works, but it is very new and quite > hacky; but I thought I'd share my work so far. > Nice work, i will look at it later. :) > You can find it at: > https://github.com/orbitfp7/qemu/commits/orbit-wp4-colo-dec > > What I've done is: > a) Wire up a partner TCP connection by the side of the RDMA > connection. > b) Use the TCP connection just for the responses from secondary->primary > c) Make the RDMA connection write to the colo-cache after > the first migrate > d) Make the RDMA connection notify the secondary when it > sends writes, so that the secondary can know that it needs > to flush those pages in the colo-cache. > e) Add a shutdown function and fix some other bugs > > I've had that working on both your current world (which is > what that tree is based off) and your older COLO world > from July (with a bit more hacking to make it take the newer > patches). > > Looking at the speed: > a) The CPU load on the incoming thread is much lower - maybe > only 10-11% instead of 30-40%. > b) The performance of guest code is a little slower (~10% slower?) > on RDMA rather than TCP (on both 10Gbps and 40Gbps links) > I've not worked out why yet. (My guess is it could be to do with > RDMA dynamic registration) > > Things I know I need to do: > 1) Tidy it up - it's very messy! > 2) Try and get rid of the TCP connection and use an RDMA > channel for the backwards connection > 3) Make sure the shutdown really can cope with the other host > being dead. > 4) It only deals with the dynamic registration mode of RDMA; > setting pin-all will probably break it. > 5) Figure out why it's slower! > 6) Test failover more. > > My work on this is part of the EU Orbit project > ( http://www.orbitproject.eu/ ) > > Dave > > > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > . >