From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:43778) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UXpsm-0004yS-Hh for qemu-devel@nongnu.org; Thu, 02 May 2013 05:32:59 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UXpsj-00078u-Sw for qemu-devel@nongnu.org; Thu, 02 May 2013 05:32:56 -0400 Received: from usmamail.tilera.com ([12.216.194.151]:9804) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UXpsj-00078L-Oh for qemu-devel@nongnu.org; Thu, 02 May 2013 05:32:53 -0400 Message-ID: <51822DFF.4070300@tilera.com> Date: Thu, 2 May 2013 17:12:31 +0800 From: Paul Guo MIME-Version: 1.0 Content-Type: text/plain; charset="GB2312" Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] qemu/virtio issue due to non-atomic data access List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Hello, I'm developing the qemu io support for kvm on arch/tile. During virtio-net testing I always saw the following similar message: "Guest moved used index from 46573 to 46592" The guest os then exits immediately. The qemu version is 0.13.0. Here is the code that reports the error message: static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx) { uint16_t num_heads = vring_avail_idx(vq) - idx; /* Check it isn't doing very strange things with descriptor numbers. */ if (num_heads > vq->vring.num) { fprintf(stderr, "Guest moved used index from %u to %u", idx, vring_avail_idx(vq)); exit(1); } return num_heads; } I looked into this issue a bit, it seems that this is due to the non-atomic data access of some virtio variables in qemu. In the above case, vq->vring.avail.idx is modified by kernel and is read in qemu via lduw_le_p() (for our default hw configuration case). lduw_le_p() loads the 16bit values byte by byte. If the kernel is updating the value from 0xB5FF to 0xB600 (i.e. 46592), qemu probably reads 0xB6FF and then virtqueue_num_heads() enters the error handling branch. static inline int lduw_le_p(const void *ptr) { #ifdef _ARCH_PPC int val; __asm__ __volatile__ ("lhbrx %0,0,%1" : "=r" (val) : "r" (ptr)); return val; #else const uint8_t *p = ptr; return p[0] | (p[1] << 8); #endif } Latest qemu changes to use memcpy() in lduw_le_p(), but if the alignment of the destination pointer in memcpy() is not implied, the compiler will probably still have to load byte by byte, thus vring_avail_idx() still has this issue. A proper fix for this issue seems to be: Judge whether the address is aligned, do direct loading for the aligned case in ldq_le_p(), etc? Thanks, Paul