From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57957) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XJere-0002RV-PR for qemu-devel@nongnu.org; Tue, 19 Aug 2014 04:34:07 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XJerV-0001rz-7s for qemu-devel@nongnu.org; Tue, 19 Aug 2014 04:33:58 -0400 Received: from mail-wg0-x229.google.com ([2a00:1450:400c:c00::229]:56085) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XJerU-0001rP-TU for qemu-devel@nongnu.org; Tue, 19 Aug 2014 04:33:49 -0400 Received: by mail-wg0-f41.google.com with SMTP id z12so5983628wgg.0 for ; Tue, 19 Aug 2014 01:33:47 -0700 (PDT) References: <53DBE726.4050102@gmail.com> <1406947532.2680.11.camel@usa> <53E0AA60.9030404@gmail.com> <1407376929.21497.2.camel@usa> <53E60F34.1070607@gmail.com> <1407587152.24027.5.camel@usa> <53E8FBBD.7050703@gmail.com> <53E9247F.4030909@linux.vnet.ibm.com> <53EB7026.805@gmail.com> <53EBE672.7050903@linux.vnet.ibm.com> <20140814105802.GD2503@work-vm> Mime-Version: 1.0 (1.0) In-Reply-To: <20140814105802.GD2503@work-vm> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Message-Id: <059E3E5C-0FD8-4876-AFB7-617EBC52055C@gmail.com> From: Walid Nouri Date: Tue, 19 Aug 2014 10:33:45 +0200 Subject: Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" Cc: "michael@hinespot.com" , "hinesmr@cn.ibm.com" , "qemu-devel@nongnu.org" , "Michael R. Hines" Hi, I have tried to find more information on how to use drive-mirror besides wha= t is available on the wiki. This was not very satisfactory... This may sound naive but are there some code examples in "c" or any other la= nguage, documentation of any kind, blog entries (developer), presentation vi= deos or any other source of information to get started? Walid > Am 14.08.2014 um 12:58 schrieb "Dr. David Alan Gilbert" : >=20 > cc'ing in a couple of the COLOers. >=20 > * Michael R. Hines (mrhines@linux.vnet.ibm.com) wrote: >>> On 08/13/2014 10:03 PM, Walid Nouri wrote: >>>=20 >>> While looking to find some ideas for approaches to replicating block >>> devices I have read the paper about the Remus implementation. I think MC= >>> can take a similar approach for local disk. >>=20 >> I agree. >>=20 >>> Here are the main facts that I have understood: >>>=20 >>> Local disk contents is viewed as internal state the primary and secondar= y. >>> In the explanation they describe that for keeping disc semantics of the >>> primary and to allow the primary to run speculatively all disc state >>> changes are directly written to the disk. In parrallel and asynchronousl= y >>> send to the secondary. The secondary keeps the pending writing requests i= n >>> two disk buffers. A speculation-disk-buffer and a write-out-buffer. >>>=20 >>> After the reception of the next checkpoint the secondary copies the >>> speculation buffer to the write out buffer, commits the checkpoint and >>> applies the write out buffer to its local disk. >>>=20 >>> When the primary fails the secondary must wait until write-out-buffer ha= s >>> been completely written to disk before before changing the execution mod= e >>> to run as primary. In this case (failure of primary) the secondary >>> discards pending disk writes in its speculation buffer. This protocol >>> keeps the disc state consistent with the last checkpoint. >>>=20 >>> Remus uses the XEN specific blktap driver. As far as I know this can?t b= e >>> used with QEMU (KVM). >>>=20 >>> I must see how drive-mirror can be used for this kind of protocol. >>=20 >> That's all correct. Theoretically, we would do exactly the same thing: >> drive-mirror on the source would write immediately to disk but follow the= >> same commit semantics on the destination as Xen. >>=20 >>>=20 >>> I have taken a look at COLO. >>=20 >>> IMHO there are two points. Custom changes of the TCP-Stack are a no-go f= or >>> proprietary operating systems like Windows. It makes COLO application >>> agnostic but not operating system agnostic. The other point is that with= >>> I/O intensive workloads COLO will tend to behave like MC. This is my poi= nt >>> of view but i didn?t invest much time to understand everything in detail= . >>=20 >> Actually, if I remember correctly, the TCP stack is only modified at the >> hypervisor level - they are intercepting and translating TCP sequence >> numbers "in-flight" to detect divergence of the source and destination - >> which is not a big problem if the implementation is well-done. >=20 > The 2013 paper says: > 'COLO modifies the guest OS=E2=80=99s TCP/IP stack in order to make the b= ehavior > more deterministic. ' > but does say that an alternative might be to have a > ' comparison function that operates transparently over re-assembled TCP s= treams' >=20 >> My hope in the future was that the two approaches could be used in a >> "Hybrid" manner - actually MC has much more of a performance hit for I/O >> than COLO does because of its buffering requirements. >>=20 >> On the other hand, MC would perform better in a memory-intensive or >> CPU-intensive situation - so maybe QEMU could "switch" between the two >> mechanisms at different points in time when the resource bottleneck chang= es. >=20 > If the primary were to rate-limit the number of resynchronisations > (and send the secondary a message as soon as it knew a resync was needed) t= hat > would get some of the way, but then the only difference from microcheckpoi= nting > at that point is the secondary doing a wasteful copy and sending the packe= ts across; > it seems it should be easy to disable those if it knew that a resync was g= oing to > happen. >=20 > Dave >=20 >> - Michael > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK