From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:53974)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <walid.nouri@gmail.com>) id 1XGtIP-0004z6-P2
	for qemu-devel@nongnu.org; Mon, 11 Aug 2014 13:22:18 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <walid.nouri@gmail.com>) id 1XGtIG-00013N-NJ
	for qemu-devel@nongnu.org; Mon, 11 Aug 2014 13:22:09 -0400
Received: from mail-wg0-x229.google.com ([2a00:1450:400c:c00::229]:62740)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <walid.nouri@gmail.com>) id 1XGtIG-00010E-DY
	for qemu-devel@nongnu.org; Mon, 11 Aug 2014 13:22:00 -0400
Received: by mail-wg0-f41.google.com with SMTP id z12so8829544wgg.0
	for <qemu-devel@nongnu.org>; Mon, 11 Aug 2014 10:21:59 -0700 (PDT)
Message-ID: <53E8FBBD.7050703@gmail.com>
Date: Mon, 11 Aug 2014 19:22:05 +0200
From: Walid Nouri <walid.nouri@gmail.com>
MIME-Version: 1.0
References: <53D8FF52.9000104@gmail.com> <1406820870.2680.3.camel@usa>			
	<53DBE726.4050102@gmail.com> <1406947532.2680.11.camel@usa>		
	<53E0AA60.9030404@gmail.com> <1407376929.21497.2.camel@usa>	
	<53E60F34.1070607@gmail.com> <1407587152.24027.5.camel@usa>
In-Reply-To: <1407587152.24027.5.camel@usa>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State
	consistency
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org, michael@hinespot.com

Hi,
I will do my best to make a contribution :-)

Are there alternative ways of replicating local storage other than DRBD 
that are possibly feasible?
Some that are directly build into Qemu?

Walid

Am 09.08.2014 14:25, schrieb Michael R. Hines:
> On Sat, 2014-08-09 at 14:08 +0200, Walid Nouri wrote:
>> Hi Michael,
>> how is the weather in Bejing? :-)
> It's terrible. Lots of pollution =(
>
>> May I ask you some questions to your MC implementation?
>>
>> Currently i'm trying  to understand the general working of the MC
>> protokoll and possible problems that can occur so that I can discuss it
>> in my thesis.
>>
>> As far as i have understand MC relies on a shared disk. Output of the
>> primary vm are directly written, network output is buffered until the
>> corresponding checkpoint is acknowledged.
>>
>> One problem that comes into my mind is: What happens when the primary vm
>> writes to the disk and crashes before sending a corresponding checkpoint?
>>
> The MC implementation itself is incomplete, today. (I need help).
>
> The Xen Remus implementation uses the DRBD system to "mirror" all disk
> writes to the source and destination before completing each checkpoint.
>
> The KVM (mc) implementation needs exactly the same support, but it is
> missing today.
>
> Until that happens, we are *required* to use root-over-iSCSI or
> root-over-NFS (meaning that the guest filesystem is mounted directly
> inside the virtual machine without the host knowing about it.
>
> This has the effect of translating all disk I/O into network I/O,
> and since network I/O is already buffered, then we are safe.
>
>
>> Here an example: The Primary state is in the actual epoch epoch (n),
>> secondary state is in epoch (n-1). The primary writes to disk and
>> crashes before or while sending the checkpoint n. In this case the
>> secondary memory state is still at epoch (n-1) and the state of the
>> shared Disk corresponds to the primary state of epoch (n).
>>
>> How does MC guaranty that the Disk state of the backup vm is consistent
>> with its Memory state?
> As I mentioned above, we need the equivalent of the Xen solution, but I
> just haven't had the time to write it (or incorporate someone else's
> implementation). Patch is welcome =)
>
>> Is Memory-VCPU / Disk State consistency necessary under all circumstances?
>> Or can this be neglected because the secondary will (after a fail over)
>> repeat the same instructions and finally write to disk the same (as the
>> primary before) data for a second time?
>> Could this lead to fatal inconsistencies?
>>
>> Walid
>>
>
>
>
> - Michael
>
>
>