From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1O535c-000460-5u
	for qemu-devel@nongnu.org; Thu, 22 Apr 2010 16:33:36 -0400
Received: from [140.186.70.92] (port=49150 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1O535Z-000446-Cz
	for qemu-devel@nongnu.org; Thu, 22 Apr 2010 16:33:35 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69)
	(envelope-from <anthony@codemonkey.ws>) id 1O535W-0002zX-J6
	for qemu-devel@nongnu.org; Thu, 22 Apr 2010 16:33:33 -0400
Received: from mail-pw0-f45.google.com ([209.85.160.45]:39436)
	by eggs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <anthony@codemonkey.ws>) id 1O535W-0002zR-Eb
	for qemu-devel@nongnu.org; Thu, 22 Apr 2010 16:33:30 -0400
Received: by pwi6 with SMTP id 6so5849476pwi.4
	for <qemu-devel@nongnu.org>; Thu, 22 Apr 2010 13:33:29 -0700 (PDT)
Message-ID: <4BD0B295.3010004@codemonkey.ws>
Date: Thu, 22 Apr 2010 15:33:25 -0500
From: Anthony Liguori <anthony@codemonkey.ws>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] [RFC PATCH 00/20] Kemari for KVM v0.1
References: <1271829445-5328-1-git-send-email-tamura.yoshiaki@lab.ntt.co.jp>	
	<4BD00FBA.5040604@redhat.com> <4BD02684.8060202@lab.ntt.co.jp>	
	<4BD03ED8.707@redhat.com>
	<x2y87e9effc1004220616jdee0e344n634c67ab5565d755@mail.gmail.com>
In-Reply-To: <x2y87e9effc1004220616jdee0e344n634c67ab5565d755@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
Cc: ohmura.kei@lab.ntt.co.jp, mtosatti@redhat.com, kvm@vger.kernel.org, dlaor@redhat.com, qemu-devel@nongnu.org, yoshikawa.takuya@oss.ntt.co.jp, aliguori@us.ibm.com, avi@redhat.com

On 04/22/2010 08:16 AM, Yoshiaki Tamura wrote:
> 2010/4/22 Dor Laor<dlaor@redhat.com>:
>    
>> On 04/22/2010 01:35 PM, Yoshiaki Tamura wrote:
>>      
>>> Dor Laor wrote:
>>>        
>>>> On 04/21/2010 08:57 AM, Yoshiaki Tamura wrote:
>>>>          
>>>>> Hi all,
>>>>>
>>>>> We have been implementing the prototype of Kemari for KVM, and we're
>>>>> sending
>>>>> this message to share what we have now and TODO lists. Hopefully, we
>>>>> would like
>>>>> to get early feedback to keep us in the right direction. Although
>>>>> advanced
>>>>> approaches in the TODO lists are fascinating, we would like to run
>>>>> this project
>>>>> step by step while absorbing comments from the community. The current
>>>>> code is
>>>>> based on qemu-kvm.git 2b644fd0e737407133c88054ba498e772ce01f27.
>>>>>
>>>>> For those who are new to Kemari for KVM, please take a look at the
>>>>> following RFC which we posted last year.
>>>>>
>>>>> http://www.mail-archive.com/kvm@vger.kernel.org/msg25022.html
>>>>>
>>>>> The transmission/transaction protocol, and most of the control logic is
>>>>> implemented in QEMU. However, we needed a hack in KVM to prevent rip
>>>>> from
>>>>> proceeding before synchronizing VMs. It may also need some plumbing in
>>>>> the
>>>>> kernel side to guarantee replayability of certain events and
>>>>> instructions,
>>>>> integrate the RAS capabilities of newer x86 hardware with the HA
>>>>> stack, as well
>>>>> as for optimization purposes, for example.
>>>>>            
>>>> [ snap]
>>>>
>>>>          
>>>>> The rest of this message describes TODO lists grouped by each topic.
>>>>>
>>>>> === event tapping ===
>>>>>
>>>>> Event tapping is the core component of Kemari, and it decides on which
>>>>> event the
>>>>> primary should synchronize with the secondary. The basic assumption
>>>>> here is
>>>>> that outgoing I/O operations are idempotent, which is usually true for
>>>>> disk I/O
>>>>> and reliable network protocols such as TCP.
>>>>>            
>>>> IMO any type of network even should be stalled too. What if the VM runs
>>>> non tcp protocol and the packet that the master node sent reached some
>>>> remote client and before the sync to the slave the master failed?
>>>>          
>>> In current implementation, it is actually stalling any type of network
>>> that goes through virtio-net.
>>>
>>> However, if the application was using unreliable protocols, it should
>>> have its own recovering mechanism, or it should be completely stateless.
>>>        
>> Why do you treat tcp differently? You can damage the entire VM this way -
>> think of dhcp request that was dropped on the moment you switched between
>> the master and the slave?
>>      
> I'm not trying to say that we should treat tcp differently, but just
> it's severe.
> In case of dhcp request, the client would have a chance to retry after
> failover, correct?
> BTW, in current implementation,
>    

I'm slightly confused about the current implementation vs. my 
recollection of the original paper with Xen.  I had thought that all 
disk and network I/O was buffered in such a way that at each checkpoint, 
the I/O operations would be released in a burst.  Otherwise, you would 
have to synchronize after every I/O operation which is what it seems the 
current implementation does.  I'm not sure how that is accomplished 
atomically though since you could have a completed I/O operation 
duplicated on the slave node provided it didn't notify completion prior 
to failure.

Is there another kemari component that somehow handles buffering I/O 
that is not obvious from these patches?

Regards,

Anthony Liguori