From mboxrd@z Thu Jan 1 00:00:00 1970 From: Yoshiaki Tamura Subject: Re: Question on skip_emulated_instructions() Date: Thu, 08 Apr 2010 18:14:44 +0900 Message-ID: <4BBD9E84.6040307@lab.ntt.co.jp> References: <4BBAB46B.9010405@lab.ntt.co.jp> <20100406100522.GW5235@redhat.com> <20100407154324.GF303@redhat.com> <4BBCC2C9.1040301@redhat.com> <4BBD6959.6080003@lab.ntt.co.jp> <4BBD82ED.9010105@redhat.com> <20100408071953.GI303@redhat.com> <4BBD8F74.8070401@lab.ntt.co.jp> <4BBD966D.7060801@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Gleb Natapov , kvm@vger.kernel.org, Marcelo Tosatti To: Avi Kivity Return-path: Received: from tama50.ecl.ntt.co.jp ([129.60.39.147]:33756 "EHLO tama50.ecl.ntt.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751410Ab0DHJPA (ORCPT ); Thu, 8 Apr 2010 05:15:00 -0400 In-Reply-To: <4BBD966D.7060801@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: Avi Kivity wrote: > On 04/08/2010 11:30 AM, Yoshiaki Tamura wrote: >> >> If I transferred a VM after I/O operations, let's say the VM sent a= n >> TCP ACK to the client, and if a hardware failure occurred to the >> primary during the VM transferring *but the client received the TCP >> ACK*, the secondary will resume from the previous state, and it may >> need to receive some data from the client. However, because the cli= ent >> has already receiver TCP=E3=80=80ACK, it won't resend the data to t= he >> secondary. It looks this data is going to be dropped. Am I missing >> some point here? >> > > I think you should block I/O not at the cpu/device boundary (that's > inefficient as many cpu I/O instructions don't necessarily cause > externally visible I/O) but at the device level. Whenever the networ= k > device wants to send out a packet, halt the guest (letting any I/O > instructions complete), synchronize the secondary, and then release = the > pending I/O. This ensures that the secondary has all of the data pri= or > to the ack being sent out. Although I was thinking to clean up my current code, maybe I should pos= t the=20 current status for explanation now. As you mentioned, I'm capturing I/= O at the=20 device level, by inserting a hook inside of PIO/MMIO handler in virtio-= blk,=20 virtio-net and e1000 emulator. Since it's implemented naively, it'll s= top=20 (meaning I/O instructions will be delayed) until transferring the VM is= done. So what I can do here is, 1. Let I/O instructions to complete both at qemu and kvm. 2. Transfer the guest state. # VCPU and device model thinks I/O emulation is already done. 3. Finally release the pending output to the real world. >>>> If the responses to the mmio or pio request are exactly the same, >>>> then the replay will happen exactly the same. >> >> >> I agree. What I'm wondering is how can we guarantee that the respons= es >> are the same... > > I don't think you can in the general case. But if you gate output at = the > device level, instead of the instruction level, the problem goes away= , no? Yes, it should. To implement this, we need to make No.3 to be called asynchronously. I= f qemu is=20 already handling I/O asynchronously, it would be relatively easy to mak= e this.