From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1O535c-000460-5u for qemu-devel@nongnu.org; Thu, 22 Apr 2010 16:33:36 -0400 Received: from [140.186.70.92] (port=49150 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1O535Z-000446-Cz for qemu-devel@nongnu.org; Thu, 22 Apr 2010 16:33:35 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1O535W-0002zX-J6 for qemu-devel@nongnu.org; Thu, 22 Apr 2010 16:33:33 -0400 Received: from mail-pw0-f45.google.com ([209.85.160.45]:39436) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O535W-0002zR-Eb for qemu-devel@nongnu.org; Thu, 22 Apr 2010 16:33:30 -0400 Received: by pwi6 with SMTP id 6so5849476pwi.4 for ; Thu, 22 Apr 2010 13:33:29 -0700 (PDT) Message-ID: <4BD0B295.3010004@codemonkey.ws> Date: Thu, 22 Apr 2010 15:33:25 -0500 From: Anthony Liguori MIME-Version: 1.0 Subject: Re: [Qemu-devel] [RFC PATCH 00/20] Kemari for KVM v0.1 References: <1271829445-5328-1-git-send-email-tamura.yoshiaki@lab.ntt.co.jp> <4BD00FBA.5040604@redhat.com> <4BD02684.8060202@lab.ntt.co.jp> <4BD03ED8.707@redhat.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Yoshiaki Tamura Cc: ohmura.kei@lab.ntt.co.jp, mtosatti@redhat.com, kvm@vger.kernel.org, dlaor@redhat.com, qemu-devel@nongnu.org, yoshikawa.takuya@oss.ntt.co.jp, aliguori@us.ibm.com, avi@redhat.com On 04/22/2010 08:16 AM, Yoshiaki Tamura wrote: > 2010/4/22 Dor Laor: > >> On 04/22/2010 01:35 PM, Yoshiaki Tamura wrote: >> >>> Dor Laor wrote: >>> >>>> On 04/21/2010 08:57 AM, Yoshiaki Tamura wrote: >>>> >>>>> Hi all, >>>>> >>>>> We have been implementing the prototype of Kemari for KVM, and we're >>>>> sending >>>>> this message to share what we have now and TODO lists. Hopefully, we >>>>> would like >>>>> to get early feedback to keep us in the right direction. Although >>>>> advanced >>>>> approaches in the TODO lists are fascinating, we would like to run >>>>> this project >>>>> step by step while absorbing comments from the community. The current >>>>> code is >>>>> based on qemu-kvm.git 2b644fd0e737407133c88054ba498e772ce01f27. >>>>> >>>>> For those who are new to Kemari for KVM, please take a look at the >>>>> following RFC which we posted last year. >>>>> >>>>> http://www.mail-archive.com/kvm@vger.kernel.org/msg25022.html >>>>> >>>>> The transmission/transaction protocol, and most of the control logic is >>>>> implemented in QEMU. However, we needed a hack in KVM to prevent rip >>>>> from >>>>> proceeding before synchronizing VMs. It may also need some plumbing in >>>>> the >>>>> kernel side to guarantee replayability of certain events and >>>>> instructions, >>>>> integrate the RAS capabilities of newer x86 hardware with the HA >>>>> stack, as well >>>>> as for optimization purposes, for example. >>>>> >>>> [ snap] >>>> >>>> >>>>> The rest of this message describes TODO lists grouped by each topic. >>>>> >>>>> === event tapping === >>>>> >>>>> Event tapping is the core component of Kemari, and it decides on which >>>>> event the >>>>> primary should synchronize with the secondary. The basic assumption >>>>> here is >>>>> that outgoing I/O operations are idempotent, which is usually true for >>>>> disk I/O >>>>> and reliable network protocols such as TCP. >>>>> >>>> IMO any type of network even should be stalled too. What if the VM runs >>>> non tcp protocol and the packet that the master node sent reached some >>>> remote client and before the sync to the slave the master failed? >>>> >>> In current implementation, it is actually stalling any type of network >>> that goes through virtio-net. >>> >>> However, if the application was using unreliable protocols, it should >>> have its own recovering mechanism, or it should be completely stateless. >>> >> Why do you treat tcp differently? You can damage the entire VM this way - >> think of dhcp request that was dropped on the moment you switched between >> the master and the slave? >> > I'm not trying to say that we should treat tcp differently, but just > it's severe. > In case of dhcp request, the client would have a chance to retry after > failover, correct? > BTW, in current implementation, > I'm slightly confused about the current implementation vs. my recollection of the original paper with Xen. I had thought that all disk and network I/O was buffered in such a way that at each checkpoint, the I/O operations would be released in a burst. Otherwise, you would have to synchronize after every I/O operation which is what it seems the current implementation does. I'm not sure how that is accomplished atomically though since you could have a completed I/O operation duplicated on the slave node provided it didn't notify completion prior to failure. Is there another kemari component that somehow handles buffering I/O that is not obvious from these patches? Regards, Anthony Liguori