From mboxrd@z Thu Jan  1 00:00:00 1970
From: Avi Kivity <avi@redhat.com>
Subject: Re: Errors on MMIO read access on VM suspend / resume operations
Date: Thu, 13 Jan 2011 12:22:24 +0200
Message-ID: <4D2ED260.4010801@redhat.com>
References: <4D2C8305.2090609@linux.vnet.ibm.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: kvm@vger.kernel.org
To: Stefan Berger <stefanb@linux.vnet.ibm.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:26899 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1753307Ab1AMKWj (ORCPT <rfc822;kvm@vger.kernel.org>);
	Thu, 13 Jan 2011 05:22:39 -0500
In-Reply-To: <4D2C8305.2090609@linux.vnet.ibm.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 01/11/2011 06:19 PM, Stefan Berger wrote:
> Hi!
>
>   I am currently doing some long-term testing of a device model using 
> memory mapped IO (TPM TIS) and am seeing some strange errors when the 
> suspend occurs in the middle of a read operation in the Linux TPM TIS 
> device driver where the driver reads the result packet from the mmio 
> location.
>
>  Short background: The TPM response packet is read in 2 chunks. First 
> the first 10 bytes are read containing the response's header. 
> Subsequently the rest of the packet is read using knowledge of the 
> total size of the response packet from the header (bytes 2-5 in big 
> endian format). The corresponding code reading the data from the 
> hardware interface is here:
>
>
> http://lxr.linux.no/#linux+v2.6.37/drivers/char/tpm/tpm_tis.c#L228
>
>
> The test I am running is setup as follows:
>
> - inside the VM keys are permanently generated by sending commands to 
> the TPM; packets read from the interface are dumped to the screen
>
> - on the host a script suspends the VM every 6 seconds and resumes it 
> immediately afterwards (using libvirt)
>
>
> As it happens, sometimes the VM is suspended in the middle of a read 
> operation on the TPM TIS interface -- see above code reference. I see 
> that because I do dump the state of the TPM TIS when suspending and 
> see that the read offset is pointing to a location somewhere in the 
> middle of the packet - so the TPM TIS Linux driver is in the above 
> loop currently reading the data. I am observing two types of results 
> if this happens:
>
>
> - either the result read by the Linux TPM TIS driver is ok, so no 
> problem here
>
> - or the problematic case where the TPM TIS driver reads a packet with 
> a byte missing and then at the end gets a zero byte from the TPM TIS 
> interface indicating that it read beyond the available data. If the 
> suspend happened while reading the first chunk of data (header), the 
> TPM TIS driver will also complain that the available data for the 2nd  
> chunk (burst size) is less than what's expected  -- it's an off-by-one 
> error
>
>
> So, I then modified the TPM TIS device model to decrement the read 
> offset pointer by '1' in case it was detected that the suspend 
> happened in the middle of the read operation -- in Qemu I do this in 
> the post-load 'method'. This then leads to the following types of 
> results:
>
>
> - the problematic(!) case where the read packet was ok
>
> - the expected case where the TPM TIS driver reads the packet and ends 
> up having two same bytes in the result in consecutive array locations; 
> besides that the TPM TIS driver will in this case complain that it has 
> left-over data
>
>
> So my conclusion from the above tests are:
>
> - for some reason the memory read to the MMIO location happens as the 
> last instruction executed on suspend and again as the very first on 
> executed on resume. This explains to me that the TPM TIS model 
> internal pointer into the packet was advanced by '1' (the packet is 
> read by subsequently reading from the same memory location) and the 
> above problematic cases make sense


Most likely this is qemu-kvm failing to obey this snippet from 
Documentation/kvm/api.txt:

> NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO and KVM_EXIT_OSI, the corresponding
> operations are complete (and guest state is consistent) only after 
> userspace
> has re-entered the kernel with KVM_RUN.  The kernel side will first finish
> incomplete operations and then check for pending signals.  Userspace
> can re-enter the guest with an unmasked signal pending to complete
> pending operations.


However, the code appears to be correct.  kvm_run() calls handle_mmio(), 
which returns 0.  The following bit

     if (!r) {
         goto again;
     }

at the end of kvm_run() makes it enter the kernel again (delivering a 
signal to itself in case we want to stop).

>
> - the other instruction in the Linux TPM TIS drivers that for example 
> advance the buffer location do not execute twice, i.e., size++ in the 
> buf[size++] = ... in the Linux driver.
>
>
> What puzzles me is that the read operation may be run twice but others 
> don't.
>

Reads have split execution: kvm emulates the mmio instruction, notices 
that it cannot satisfy the read request, exits to qemu, then restarts 
the instruction.  If the last step is omitted due to savevm, then kvm 
will exit back to qemu again and your device will see the read duplicated.

> If you have insights as why the above may be occurring, please let me 
> know. A simple solution to work around this may be to introduce a 
> register holding the index into the result packet where to read the 
> next byte from (rather than advancing an internal pointer to the next 
> byte), though this would deviate the driver from the standard 
> interface the model currently implements.

Most undesirable, I'd like to fix the bug.

Can you sprinkle some printfs() arount kvm_run (in qemu-kvm.c) to verify 
this?

Good pattern:

   ioctl(KVM_RUN)
   -> KVM_EXIT_MMIO
   ioctl(KVM_RUN)
   -> ENTR
   no further KVM_RUNs

or

    ioctl(KVM_RUN)
    -> something other than KVM_EXIT_MMIO
    no further KVM_RUNs

Bad pattern:

    ioctl(KVM_RUN)
    -> KVM_EXIT_MMIO
    no further KVM_RUNs

(in the save portion of your test)

-- 
error compiling committee.c: too many arguments to function