From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([209.51.188.92]:34709) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gxCHH-0000C3-6d for qemu-devel@nongnu.org; Fri, 22 Feb 2019 09:58:17 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gxCF5-0007aC-JS for qemu-devel@nongnu.org; Fri, 22 Feb 2019 09:56:00 -0500 Received: from mx1.redhat.com ([209.132.183.28]:54748) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gxCF5-00069v-9c for qemu-devel@nongnu.org; Fri, 22 Feb 2019 09:55:59 -0500 References: <28e6b4ed-9afd-3a79-6267-86c7385c23ce@redhat.com> From: Paolo Bonzini Message-ID: Date: Fri, 22 Feb 2019 15:55:16 +0100 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: =?UTF-8?Q?Fernando_Casas_Sch=c3=b6ssow?= , Stefan Hajnoczi Cc: Natanael Copa , Richard Henderson , qemu-devel On 22/02/19 15:43, Fernando Casas Sch=C3=B6ssow wrote: > Hi all, >=20 > Indeed I can confirm this issue occurs with the stock, unmodified > QEMU binary provided with Alpine since at least Alpine 3.6 up to > 3.9. >=20 > Is there any compiler flag I can tweak, add or remove to rebuild qemu > and try to repro to confirm a possible workaround? Nope, not yet. However, I can try some experiments if you can provide some information on how to rebuild an apk. Paolo > Thanks. >=20 > Fernando >=20 > Sent from my iPhone >=20 >> On 22 Feb 2019, at 15:04, Stefan Hajnoczi wrote: >> >> On Fri, Feb 22, 2019 at 12:57 PM Fernando Casas Sch=C3=B6ssow >> wrote: >> >> I have CCed Natanael Copa, qemu package maintainer in Alpine Linux. >> >> Fernando: Can you confirm that the bug occurs with an unmodified >> Alpine Linux qemu binary? >> >> Richard: Commit 7db2145a6826b14efceb8dd64bfe6ad8647072eb ("bswap: Add >> host endian unaligned access functions") introduced the unaligned >> memory access functions in question here. Please see below for >> details on the bug - basically QEMU code assumes they are atomic, but >> that is not guaranteed :(. Any ideas for how to fix this? >> >> Natanael: It seems likely that the qemu package in Alpine Linux >> suffers from a compilation issue resulting in a broken QEMU. It may >> be necessary to leave the compiler optimization flag alone in APKBUILD >> to work around this problem. >> >> Here are the details. QEMU relies on the compiler turning >> memcpy(&dst, &src, 2) turning into a load instruction in >> include/qemu/bswap.h:lduw_he_p() (explanation below): >> >> /* Any compiler worth its salt will turn these memcpy into native unal= igned >> operations. Thus we don't need to play games with packed attributes= , or >> inline byte-by-byte stores. */ >> >> static inline int lduw_he_p(const void *ptr) >> { >> uint16_t r; >> memcpy(&r, ptr, sizeof(r)); >> return r; >> } >> >> Here is the disassembly snippet of virtqueue_pop() from Fedora 29 that >> shows the load instruction: >> >> 398166: 0f b7 42 02 movzwl 0x2(%rdx),%eax >> 39816a: 66 89 43 32 mov %ax,0x32(%rbx) >> >> Here is the instruction sequence in the Alpine Linux binary: >> >> 455562: ba 02 00 00 00 mov $0x2,%edx >> 455567: e8 74 24 f3 ff callq 3879e0 >> 45556c: 0f b7 44 24 42 movzwl 0x42(%rsp),%eax >> 455571: 66 41 89 47 32 mov %ax,0x32(%r15) >> >> It's calling memcpy instead of using a load instruction. >> >> Fernando found that QEMU's virtqueue_pop() function sees bogus values >> when loading a 16-bit guest RAM location. Paolo figured out that the >> bogus value can be produced by memcpy() when another thread is >> updating the 16-bit memory location simultaneously. This is a race >> condition between one thread loading the 16-bit value and another >> thread storing it (in this case a guest vcpu thread). Sometimes >> memcpy() may load one old byte and one new byte, resulting in a bogus >> value. >> >> The symptom that Fernando experienced is a "Virtqueue size exceeded" >> error message from QEMU and then the virtio-blk or virtio-scsi device >> stops working. This issue potentially affects other device emulation >> code in QEMU as well, not just virtio devices. >> >> For the time being, I suggest tweaking the APKBUILD so the memcpy() is >> not generated. Hopefully QEMU can make the code more portable in the >> future so the compiler always does the expected thing, but this may >> not be easily possible. >> >> Stefan