From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53974) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XGtIP-0004z6-P2 for qemu-devel@nongnu.org; Mon, 11 Aug 2014 13:22:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XGtIG-00013N-NJ for qemu-devel@nongnu.org; Mon, 11 Aug 2014 13:22:09 -0400 Received: from mail-wg0-x229.google.com ([2a00:1450:400c:c00::229]:62740) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XGtIG-00010E-DY for qemu-devel@nongnu.org; Mon, 11 Aug 2014 13:22:00 -0400 Received: by mail-wg0-f41.google.com with SMTP id z12so8829544wgg.0 for ; Mon, 11 Aug 2014 10:21:59 -0700 (PDT) Message-ID: <53E8FBBD.7050703@gmail.com> Date: Mon, 11 Aug 2014 19:22:05 +0200 From: Walid Nouri MIME-Version: 1.0 References: <53D8FF52.9000104@gmail.com> <1406820870.2680.3.camel@usa> <53DBE726.4050102@gmail.com> <1406947532.2680.11.camel@usa> <53E0AA60.9030404@gmail.com> <1407376929.21497.2.camel@usa> <53E60F34.1070607@gmail.com> <1407587152.24027.5.camel@usa> In-Reply-To: <1407587152.24027.5.camel@usa> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org, michael@hinespot.com Hi, I will do my best to make a contribution :-) Are there alternative ways of replicating local storage other than DRBD that are possibly feasible? Some that are directly build into Qemu? Walid Am 09.08.2014 14:25, schrieb Michael R. Hines: > On Sat, 2014-08-09 at 14:08 +0200, Walid Nouri wrote: >> Hi Michael, >> how is the weather in Bejing? :-) > It's terrible. Lots of pollution =( > >> May I ask you some questions to your MC implementation? >> >> Currently i'm trying to understand the general working of the MC >> protokoll and possible problems that can occur so that I can discuss it >> in my thesis. >> >> As far as i have understand MC relies on a shared disk. Output of the >> primary vm are directly written, network output is buffered until the >> corresponding checkpoint is acknowledged. >> >> One problem that comes into my mind is: What happens when the primary vm >> writes to the disk and crashes before sending a corresponding checkpoint? >> > The MC implementation itself is incomplete, today. (I need help). > > The Xen Remus implementation uses the DRBD system to "mirror" all disk > writes to the source and destination before completing each checkpoint. > > The KVM (mc) implementation needs exactly the same support, but it is > missing today. > > Until that happens, we are *required* to use root-over-iSCSI or > root-over-NFS (meaning that the guest filesystem is mounted directly > inside the virtual machine without the host knowing about it. > > This has the effect of translating all disk I/O into network I/O, > and since network I/O is already buffered, then we are safe. > > >> Here an example: The Primary state is in the actual epoch epoch (n), >> secondary state is in epoch (n-1). The primary writes to disk and >> crashes before or while sending the checkpoint n. In this case the >> secondary memory state is still at epoch (n-1) and the state of the >> shared Disk corresponds to the primary state of epoch (n). >> >> How does MC guaranty that the Disk state of the backup vm is consistent >> with its Memory state? > As I mentioned above, we need the equivalent of the Xen solution, but I > just haven't had the time to write it (or incorporate someone else's > implementation). Patch is welcome =) > >> Is Memory-VCPU / Disk State consistency necessary under all circumstances? >> Or can this be neglected because the secondary will (after a fail over) >> repeat the same instructions and finally write to disk the same (as the >> primary before) data for a second time? >> Could this lead to fatal inconsistencies? >> >> Walid >> > > > > - Michael > > >