From mboxrd@z Thu Jan  1 00:00:00 1970
From: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
Subject: Re: Question on skip_emulated_instructions()
Date: Thu, 08 Apr 2010 14:27:53 +0900
Message-ID: <4BBD6959.6080003@lab.ntt.co.jp>
References: <4BBAB46B.9010405@lab.ntt.co.jp> <20100406100522.GW5235@redhat.com>	 <r2l87e9effc1004062325jba1026e1v2598333ccfa51964@mail.gmail.com>	 <20100407154324.GF303@redhat.com> <v2u87e9effc1004071021udb91678bm762ea3289ad818a1@mail.gmail.com> <4BBCC2C9.1040301@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Gleb Natapov <gleb@redhat.com>, kvm@vger.kernel.org,
	Marcelo Tosatti <mtosatti@redhat.com>
To: Avi Kivity <avi@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from tama50.ecl.ntt.co.jp ([129.60.39.147]:63396 "EHLO
	tama50.ecl.ntt.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751506Ab0DHF2J (ORCPT <rfc822;kvm@vger.kernel.org>);
	Thu, 8 Apr 2010 01:28:09 -0400
In-Reply-To: <4BBCC2C9.1040301@redhat.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Avi Kivity wrote:
> On 04/07/2010 08:21 PM, Yoshiaki Tamura wrote:
>>
>> The problem here is that, I needed to transfer the VM state which is
>> just *before* the output to the devices. Otherwise, the VM state has
>> already been proceeded, and after failover, some I/O didn't work as I
>> expected.
>> I tracked down this issue, and figured out rip was already proceeded
>> in KVM,
>> and transferring this VCPU state was meaningless.
>>
>> I'm planning to post the patch set of Kemari soon, but I would like to
>> solve
>> this rip issue before that. If there is no drawback, I'm happy to work
>> and post a patch.
>
> vcpu state is undefined when an mmio operation is pending,
> Documentation/kvm/api.txt says the following:
>
>> NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO and KVM_EXIT_OSI, the corresponding
>> operations are complete (and guest state is consistent) only after
>> userspace
>> has re-entered the kernel with KVM_RUN. The kernel side will first finish
>> incomplete operations and then check for pending signals. Userspace
>> can re-enter the guest with an unmasked signal pending to complete
>> pending operations.

Thanks for the information.

So the point is the vcpu state that can been observed from qemu upon 
KVM_EXIT_IO, KVM_EXIT_MMIO and KVM_EXIT_OSI should not be used because it's not 
complete/consistent?

> Currently we complete instructions for output operations and leave them
> incomplete for input operations. Deferring completion for output
> operations should work, except it may break the vmware backdoor port
> (see hw/vmport.c), which changes register state following an output
> instruction, and KVM_EXIT_TPR_ACCESS, where userspace reads the state
> following a write instruction.
>
> Do you really need to transfer the vcpu state before the instruction, or
> do you just need a consistent state? If the latter, then you can get
> away by posting a signal and re-entering the guest. kvm will complete
> the instruction and exit immediately, and you will have fully consistent
> state.

The requirement is that the guest must always be able to replay at least the 
instruction which triggered the synchronization on the primary.  From that point 
of view, I think I need to transfer the vcpu state before the instruction.  If I 
post a signal and let the guest or emulator proceed, I'm not sure whether the 
guest on the secondary can be replay as expected.  Please point out if I were 
misunderstanding.