From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:43778)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <ggang@tilera.com>) id 1UXpsm-0004yS-Hh
	for qemu-devel@nongnu.org; Thu, 02 May 2013 05:32:59 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <ggang@tilera.com>) id 1UXpsj-00078u-Sw
	for qemu-devel@nongnu.org; Thu, 02 May 2013 05:32:56 -0400
Received: from usmamail.tilera.com ([12.216.194.151]:9804)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <ggang@tilera.com>) id 1UXpsj-00078L-Oh
	for qemu-devel@nongnu.org; Thu, 02 May 2013 05:32:53 -0400
Message-ID: <51822DFF.4070300@tilera.com>
Date: Thu, 2 May 2013 17:12:31 +0800
From: Paul Guo <ggang@tilera.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="GB2312"
Content-Transfer-Encoding: 7bit
Subject: [Qemu-devel] qemu/virtio issue due to non-atomic data access
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org

Hello,

I'm developing the qemu io support for kvm on arch/tile. During virtio-net testing I always saw the following similar message:

"Guest moved used index from 46573 to 46592"

The guest os then exits immediately. The qemu version is 0.13.0.

Here is the code that reports the error message:

static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
{
    uint16_t num_heads = vring_avail_idx(vq) - idx;

    /* Check it isn't doing very strange things with descriptor numbers. */
    if (num_heads > vq->vring.num) {
        fprintf(stderr, "Guest moved used index from %u to %u",
                idx, vring_avail_idx(vq));
        exit(1);
    }

    return num_heads;
}

I looked into this issue a bit, it seems that this is due to the non-atomic data access of some virtio variables in qemu. In the above case, vq->vring.avail.idx is modified by kernel and is read in qemu via lduw_le_p() (for our default hw configuration case). lduw_le_p() loads the 16bit values byte by byte. If the kernel is updating the value from 0xB5FF to 0xB600 (i.e. 46592), qemu probably reads 0xB6FF and then virtqueue_num_heads() enters the error handling branch.

static inline int lduw_le_p(const void *ptr)
{
#ifdef _ARCH_PPC
    int val;
    __asm__ __volatile__ ("lhbrx %0,0,%1" : "=r" (val) : "r" (ptr));
    return val;
#else
    const uint8_t *p = ptr;
    return p[0] | (p[1] << 8);
#endif
}

Latest qemu changes to use memcpy() in lduw_le_p(), but if the alignment of the destination pointer in memcpy() is not implied, the compiler will probably still have to load byte by byte, thus vring_avail_idx() still has this issue.

A proper fix for this issue seems to be: Judge whether the address is aligned, do direct loading for the aligned case in ldq_le_p(), etc?

Thanks,
Paul