From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:52252)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1ZIL66-0001Ep-WA
	for qemu-devel@nongnu.org; Thu, 23 Jul 2015 14:19:59 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1ZIL62-0007bi-19
	for qemu-devel@nongnu.org; Thu, 23 Jul 2015 14:19:58 -0400
Received: from mail-wi0-x232.google.com ([2a00:1450:400c:c05::232]:32980)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1ZIL61-0007ax-QP
	for qemu-devel@nongnu.org; Thu, 23 Jul 2015 14:19:53 -0400
Received: by wicmv11 with SMTP id mv11so35360896wic.0
	for <qemu-devel@nongnu.org>; Thu, 23 Jul 2015 11:19:53 -0700 (PDT)
Sender: Paolo Bonzini <paolo.bonzini@gmail.com>
References: <1437565425-29861-1-git-send-email-stefanha@redhat.com>
	<1437565425-29861-6-git-send-email-stefanha@redhat.com>
	<20150723161413.15ec718a.cornelia.huck@de.ibm.com>
	<55B12274.2050005@redhat.com>
From: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <55B13046.2060205@redhat.com>
Date: Thu, 23 Jul 2015 20:19:50 +0200
MIME-Version: 1.0
In-Reply-To: <55B12274.2050005@redhat.com>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 8bit
Subject: Re: [Qemu-devel] [PULL v2 for-2.4 v2 5/7] AioContext: fix broken
 ctx->dispatching optimization
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Kevin Wolf <kwolf@redhat.com>, Peter Maydell <peter.maydell@linaro.org>, qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>, Christian Borntraeger <borntraeger@de.ibm.com>



On 23/07/2015 19:20, Paolo Bonzini wrote:
> 
> 
> On 23/07/2015 16:14, Cornelia Huck wrote:
>> (gdb) bt 
>> #0  0x000003fffc5871b4 in pthread_cond_wait@@GLIBC_2.3.2 ()
>>    from /lib64/libpthread.so.0
>> #1  0x000000008024cfca in qemu_cond_wait (cond=cond@entry=0x9717d950, 
>>     mutex=mutex@entry=0x9717d920)
>>     at /data/git/yyy/qemu/util/qemu-thread-posix.c:132
>> #2  0x000000008025e83a in rfifolock_lock (r=0x9717d920)
>>     at /data/git/yyy/qemu/util/rfifolock.c:59
>> #3  0x00000000801b78fa in aio_context_acquire (ctx=<optimized out>)
>>     at /data/git/yyy/qemu/async.c:331
>> #4  0x000000008007ceb4 in virtio_blk_data_plane_start (s=0x9717d710)
>>     at /data/git/yyy/qemu/hw/block/dataplane/virtio-blk.c:285
>> #5  0x000000008007c64a in virtio_blk_handle_output (vdev=<optimized out>, 
>>     vq=<optimized out>) at /data/git/yyy/qemu/hw/block/virtio-blk.c:599
>> #6  0x00000000801c56dc in qemu_iohandler_poll (pollfds=0x97142800, 
>>     ret=ret@entry=1) at /data/git/yyy/qemu/iohandler.c:126
>> #7  0x00000000801c5178 in main_loop_wait (nonblocking=<optimized out>)
>>     at /data/git/yyy/qemu/main-loop.c:494
>> #8  0x0000000080013ee2 in main_loop () at /data/git/yyy/qemu/vl.c:1902
>> #9  main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
>>     at /data/git/yyy/qemu/vl.c:4653
>>
>> I've stripped down the setup to the following commandline:
>>
>> /data/git/yyy/qemu/build/s390x-softmmu/qemu-system-s390x  -machine
>> s390-ccw-virtio-2.4,accel=kvm,usb=off -m 1024 -smp
>> 4,sockets=4,cores=1,threads=1 -nographic -drive
>> file=/dev/sda,if=none,id=drive-virtio-disk0,format=raw,serial=ccwzfcp1,cache=none,aio=native
>> -device
>> virtio-blk-ccw,scsi=off,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,x-data-plane=on
> 
> What's the backtrace like for the other threads?  This is almost
> definitely a latent bug somewhere else.

BTW, I can reproduce this---I'm asking because I cannot even attach gdb
to the hung process.

The simplest workaround is to reintroduce commit a0710f7995 (iothread:
release iothread around aio_poll, 2015-02-20), though it also comes with
some risk.  It avoids the bug because it limits the contention on the
RFifoLock.

Paolo