From mboxrd@z Thu Jan  1 00:00:00 1970
From: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
Subject: Re: [Qemu-devel] [RFC PATCH 00/20] Kemari for KVM v0.1
Date: Fri, 23 Apr 2010 09:20:21 +0900
Message-ID: <4BD0E7C5.4000401@lab.ntt.co.jp>
References: <1271829445-5328-1-git-send-email-tamura.yoshiaki@lab.ntt.co.jp> <4BD00FBA.5040604@redhat.com> <4BD02684.8060202@lab.ntt.co.jp> <20100422161546.GC6265@shareable.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: dlaor@redhat.com, ohmura.kei@lab.ntt.co.jp, kvm@vger.kernel.org,
	mtosatti@redhat.com, aliguori@us.ibm.com, qemu-devel@nongnu.org,
	yoshikawa.takuya@oss.ntt.co.jp, avi@redhat.com
To: Jamie Lokier <jamie@shareable.org>
Return-path: <kvm-owner@vger.kernel.org>
Received: from tama500.ecl.ntt.co.jp ([129.60.39.148]:55058 "EHLO
	tama500.ecl.ntt.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1753192Ab0DWAUw (ORCPT <rfc822;kvm@vger.kernel.org>);
	Thu, 22 Apr 2010 20:20:52 -0400
In-Reply-To: <20100422161546.GC6265@shareable.org>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Jamie Lokier wrote:
> Yoshiaki Tamura wrote:
>> Dor Laor wrote:
>>> On 04/21/2010 08:57 AM, Yoshiaki Tamura wrote:
>>>> Event tapping is the core component of Kemari, and it decides on which
>>>> event the
>>>> primary should synchronize with the secondary. The basic assumption
>>>> here is
>>>> that outgoing I/O operations are idempotent, which is usually true for
>>>> disk I/O
>>>> and reliable network protocols such as TCP.
>>>
>>> IMO any type of network even should be stalled too. What if the VM runs
>>> non tcp protocol and the packet that the master node sent reached some
>>> remote client and before the sync to the slave the master failed?
>>
>> In current implementation, it is actually stalling any type of network
>> that goes through virtio-net.
>>
>> However, if the application was using unreliable protocols, it should have
>> its own recovering mechanism, or it should be completely stateless.
>
> Even with unreliable protocols, if slave takeover causes the receiver
> to have received a packet that the sender _does not think it has ever
> sent_, expect some protocols to break.
>
> If the slave replaying master's behaviour since the last sync means it
> will definitely get into the same state of having sent the packet,
> that works out.

That's something we're expecting now.

> But you still have to be careful that the other end's responses to
> that packet are not seen by the slave too early during that replay.
> Otherwise, for example, the slave may observe a TCP ACK to a packet
> that it hasn't yet sent, which is an error.

Even current implementation syncs just before network output, what you pointed 
out could happen.  In this case, would the connection going to be lost, or would 
client/server recover from it?  If latter, it would be fine, otherwise I wonder 
how people doing similar things are handling this situation.

> About IP idempotency:
>
> In general, IP packets are allowed to be lost or duplicated in the
> network.  All IP protocols should be prepared for that; it is a basic
> property.
>
> However there is one respect in which they're not idempotent:
>
> The TTL field should be decreased if packets are delayed.  Packets
> should not appear to live in the network for longer than TTL seconds.
> If they do, some protocols (like TCP) can react to the delayed ones
> differently, such as sending a RST packet and breaking a connection.
>
> It is acceptable to reduce TTL faster than the minimum.  After all, it
> is reduced by 1 on every forwarding hop, in addition to time delays.

So the problem is, when the slave takes over, it sends a packet with same TTL 
which client may have received.

>> I currently don't have good numbers that I can share right now.
>> Snapshots/sec depends on what kind of workload is running, and if the
>> guest was almost idle, there will be no snapshots in 5sec.  On the other
>> hand, if the guest was running I/O intensive workloads (netperf, iozone
>> for example), there will be about 50 snapshots/sec.
>
> That is a really satisfying number, thank you :-)
>
> Without this work I wouldn't have imagined that synchronised machines
> could work with such a low transaction rate.

Thank you for your comments.

Although I haven't prepared good data yet, I personally prefer to have 
discussion with actual implementation and experimental data.