From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55522) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YW3Gr-0008Ht-Sz for qemu-devel@nongnu.org; Thu, 12 Mar 2015 09:35:31 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YW3Gm-0000go-TV for qemu-devel@nongnu.org; Thu, 12 Mar 2015 09:35:29 -0400 Received: from mail-wi0-x22a.google.com ([2a00:1450:400c:c05::22a]:39722) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YW3Gm-0000gL-Ib for qemu-devel@nongnu.org; Thu, 12 Mar 2015 09:35:24 -0400 Received: by wiwl15 with SMTP id l15so47744660wiw.4 for ; Thu, 12 Mar 2015 06:35:21 -0700 (PDT) Date: Thu, 12 Mar 2015 13:35:19 +0000 From: Stefan Hajnoczi Message-ID: <20150312133519.GT10493@stefanha-thinkpad.redhat.com> References: <1426002357-6889-1-git-send-email-pbonzini@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="D5tFrmRBv7YOLFOK" Content-Disposition: inline In-Reply-To: <1426002357-6889-1-git-send-email-pbonzini@redhat.com> Subject: Re: [Qemu-devel] [PATCH] queue: fix QSLIST_INSERT_HEAD_ATOMIC race List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: borntraeger@de.ibm.com, qemu-devel@nongnu.org, stefanha@redhat.com --D5tFrmRBv7YOLFOK Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Mar 10, 2015 at 04:45:57PM +0100, Paolo Bonzini wrote: > There is a not-so-subtle race in QSLIST_INSERT_HEAD_ATOMIC. >=20 > Because atomic_cmpxchg returns the old value instead of a success flag, > QSLIST_INSERT_HEAD_ATOMIC was checking for success by comparing against > the second argument to atomic_cmpxchg. Unfortunately, this only works > if the second argument is a local or thread-local variable. >=20 > If it is in memory, it can be subject to common subexpression elimination > (and then everything's fine) or reloaded after the atomic_cmpxchg, > depending on the compiler's whims. If the latter happens, the race can > happen. A thread can sneak in, doing something on elm->field.sle_next > after the atomic_cmpxchg and before the comparison. This causes a wrong > failure, and then two threads are using "elm" at the same time. In the > case discovered by Christian, the sequence was likely something like this: >=20 > thread 1 | thread 2 > QSLIST_INSERT_HEAD_ATOMIC | > atomic_cmpxchg succeeds | > elm added to list | > | steal release_pool > | QSLIST_REMOVE_HEAD > | elm removed from list > | ... > | QSLIST_INSERT_HEAD_ATOMIC > | (overwrites sle_next) > spurious failure | > atomic_cmpxchg succeeds | > elm added to list again | > | > steal release_pool | > QSLIST_REMOVE_HEAD | > elm removed again | >=20 > The last three steps could be done by a third thread as well. > A reproducer that failed in a matter of seconds is as follows: >=20 > - the guest has 32 VCPUs on a 28 core host (hyperthreading was enabled), > memory was 16G just to err on the safe side (the host has 64G, but hey > at least you need no s390) >=20 > - the guest has 24 null-aio virtio-blk devices using dataplane > (-object iothread,id=3DioN -drive if=3Dnone,id=3DblkN,driver=3Dnull-aio= ,size=3D500G > -device virtio-blk-pci,iothread=3DioN,drive=3DblkN) >=20 > - the guest also has a single network interface. It's only doing loopback > tests so slirp vs. tap and the model doesn't matter. >=20 > - the guest is running fio with the following script: >=20 > [global] > rw=3Drandread > blocksize=3D16k > ioengine=3Dlibaio > runtime=3D10m > buffered=3D0 > fallocate=3Dnone > time_based > iodepth=3D32 >=20 > [virtio1a] > filename=3D/dev/block/252\:16 >=20 > [virtio1b] > filename=3D/dev/block/252\:16 >=20 > ... >=20 > [virtio24a] > filename=3D/dev/block/252\:384 >=20 > [virtio24b] > filename=3D/dev/block/252\:384 >=20 > [listen1] > protocol=3Dtcp > ioengine=3Dnet > port=3D12345 > listen > rw=3Dread > bs=3D4k > size=3D1000g >=20 > [connect1] > protocol=3Dtcp > hostname=3Dlocalhost > ioengine=3Dnet > port=3D12345 > protocol=3Dtcp > rw=3Dwrite > startdelay=3D1 > size=3D1000g >=20 > ... >=20 > [listen8] > protocol=3Dtcp > ioengine=3Dnet > port=3D12352 > listen > rw=3Dread > bs=3D4k > size=3D1000g >=20 > [connect8] > protocol=3Dtcp > hostname=3Dlocalhost > ioengine=3Dnet > port=3D12352 > rw=3Dwrite > startdelay=3D1 > size=3D1000g >=20 > Moral of the story: I should refrain from writing more clever stuff. > At least it looks like it is not too clever to be undebuggable. >=20 > Reported-by: Christian Borntraeger > Fixes: c740ad92d0d958fa785e5d7aa1b67ecaf30a6a54 > Signed-off-by: Paolo Bonzini > --- > include/qemu/queue.h | 11 ++++++----- > 1 file changed, 6 insertions(+), 5 deletions(-) Thanks, applied to my block tree for QEMU 2.3: https://github.com/stefanha/qemu/commits/block Stefan --D5tFrmRBv7YOLFOK Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJVAZYXAAoJEJykq7OBq3PIr+YIAJ7ilRlHr0nGYQnHzW2oeStX U00PYluS37BxT8dhn38c5VYf08gPA7bZDU8dNs/RUoyDkAPKhgI7tRfvD0YcWHJv yrXU86IIdoMWo2GSCbFpbOucih1OjUE82zS8t4mJB0xrhGIIxkPzKSvdI0pznKI4 Xo6S0lSfrPpPNL0mVMQSIgrWZ/Xrg2kp+Jw7/iuYWDhVsH+IWjmfpnQ9UvheMSzt 67R7Wf6qXfEqPAxGtjaVh+7v/Fsxs0bDjq6mrq1NXh5QiLlti4KqC+BIjEKHrfn2 iXt0uJ7TGESPBM80kP1C/JQ2Hkokm+8iPUUpix3jFeqPXbYc3w9PYh4szgCjTWg= =DRtS -----END PGP SIGNATURE----- --D5tFrmRBv7YOLFOK--