From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55174) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YW3Fl-0007JU-9V for qemu-devel@nongnu.org; Thu, 12 Mar 2015 09:34:22 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YW3Fg-0000CJ-8v for qemu-devel@nongnu.org; Thu, 12 Mar 2015 09:34:21 -0400 Received: from mail-wi0-x233.google.com ([2a00:1450:400c:c05::233]:41169) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YW3Ff-0000CD-Vl for qemu-devel@nongnu.org; Thu, 12 Mar 2015 09:34:16 -0400 Received: by wiwl15 with SMTP id l15so20436267wiw.0 for ; Thu, 12 Mar 2015 06:34:15 -0700 (PDT) Date: Thu, 12 Mar 2015 13:34:12 +0000 From: Stefan Hajnoczi Message-ID: <20150312133412.GS10493@stefanha-thinkpad.redhat.com> References: <1426002357-6889-1-git-send-email-pbonzini@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="qZLIv6EoKi7YuaSc" Content-Disposition: inline In-Reply-To: <1426002357-6889-1-git-send-email-pbonzini@redhat.com> Subject: Re: [Qemu-devel] [PATCH] queue: fix QSLIST_INSERT_HEAD_ATOMIC race List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: borntraeger@de.ibm.com, qemu-devel@nongnu.org, stefanha@redhat.com --qZLIv6EoKi7YuaSc Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Mar 10, 2015 at 04:45:57PM +0100, Paolo Bonzini wrote: > There is a not-so-subtle race in QSLIST_INSERT_HEAD_ATOMIC. >=20 > Because atomic_cmpxchg returns the old value instead of a success flag, > QSLIST_INSERT_HEAD_ATOMIC was checking for success by comparing against > the second argument to atomic_cmpxchg. Unfortunately, this only works > if the second argument is a local or thread-local variable. >=20 > If it is in memory, it can be subject to common subexpression elimination > (and then everything's fine) or reloaded after the atomic_cmpxchg, > depending on the compiler's whims. If the latter happens, the race can > happen. A thread can sneak in, doing something on elm->field.sle_next > after the atomic_cmpxchg and before the comparison. This causes a wrong > failure, and then two threads are using "elm" at the same time. In the > case discovered by Christian, the sequence was likely something like this: >=20 > thread 1 | thread 2 > QSLIST_INSERT_HEAD_ATOMIC | > atomic_cmpxchg succeeds | > elm added to list | > | steal release_pool > | QSLIST_REMOVE_HEAD > | elm removed from list > | ... > | QSLIST_INSERT_HEAD_ATOMIC > | (overwrites sle_next) > spurious failure | > atomic_cmpxchg succeeds | > elm added to list again | > | > steal release_pool | > QSLIST_REMOVE_HEAD | > elm removed again | >=20 > The last three steps could be done by a third thread as well. > A reproducer that failed in a matter of seconds is as follows: >=20 > - the guest has 32 VCPUs on a 28 core host (hyperthreading was enabled), > memory was 16G just to err on the safe side (the host has 64G, but hey > at least you need no s390) >=20 > - the guest has 24 null-aio virtio-blk devices using dataplane > (-object iothread,id=3DioN -drive if=3Dnone,id=3DblkN,driver=3Dnull-aio= ,size=3D500G > -device virtio-blk-pci,iothread=3DioN,drive=3DblkN) >=20 > - the guest also has a single network interface. It's only doing loopback > tests so slirp vs. tap and the model doesn't matter. >=20 > - the guest is running fio with the following script: >=20 > [global] > rw=3Drandread > blocksize=3D16k > ioengine=3Dlibaio > runtime=3D10m > buffered=3D0 > fallocate=3Dnone > time_based > iodepth=3D32 >=20 > [virtio1a] > filename=3D/dev/block/252\:16 >=20 > [virtio1b] > filename=3D/dev/block/252\:16 >=20 > ... >=20 > [virtio24a] > filename=3D/dev/block/252\:384 >=20 > [virtio24b] > filename=3D/dev/block/252\:384 >=20 > [listen1] > protocol=3Dtcp > ioengine=3Dnet > port=3D12345 > listen > rw=3Dread > bs=3D4k > size=3D1000g >=20 > [connect1] > protocol=3Dtcp > hostname=3Dlocalhost > ioengine=3Dnet > port=3D12345 > protocol=3Dtcp > rw=3Dwrite > startdelay=3D1 > size=3D1000g >=20 > ... >=20 > [listen8] > protocol=3Dtcp > ioengine=3Dnet > port=3D12352 > listen > rw=3Dread > bs=3D4k > size=3D1000g >=20 > [connect8] > protocol=3Dtcp > hostname=3Dlocalhost > ioengine=3Dnet > port=3D12352 > rw=3Dwrite > startdelay=3D1 > size=3D1000g >=20 > Moral of the story: I should refrain from writing more clever stuff. > At least it looks like it is not too clever to be undebuggable. >=20 > Reported-by: Christian Borntraeger > Fixes: c740ad92d0d958fa785e5d7aa1b67ecaf30a6a54 > Signed-off-by: Paolo Bonzini > --- > include/qemu/queue.h | 11 ++++++----- > 1 file changed, 6 insertions(+), 5 deletions(-) Reviewed-by: Stefan Hajnoczi --qZLIv6EoKi7YuaSc Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJVAZXUAAoJEJykq7OBq3PIIDcIALt2fCHUV4bYjCk+4oQSGvxl 1FDEeY0vV4ZibhA2CnZk2UWF77M1wXAwuK1ItDNGh/cdSVKw2EjmRqQ0r+PzzU6v 9e1s9aRYioZAIURptprN7UQManESHWJTEVYQXXsqPng9hqwjDFrj3T3XNhYJ9yXP dybtiqBLmYb2XluVTjOYBxBStwbm67sjeWVu+aZ2QT27lZKxr9CtzUjpapCjgWoc UQQhnv2CzYBDKKzoL+aWtZ2S5Ozp5NkW7Abk2QYBnCWFNuSc+wGPjEctMHpxxrqL Ic/rSraa2SAeALiDK1CQF3XvuzQfgqyLAdDxrJ95EpyjaZHYkyM1tqb5goADONY= =ZBPj -----END PGP SIGNATURE----- --qZLIv6EoKi7YuaSc--