From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:53164) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RhqWD-0007IZ-Pn for qemu-devel@nongnu.org; Mon, 02 Jan 2012 17:38:14 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RhqWC-0001DR-JW for qemu-devel@nongnu.org; Mon, 02 Jan 2012 17:38:13 -0500 Received: from mail-wi0-f173.google.com ([209.85.212.173]:53211) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RhqWC-0001DH-Dq for qemu-devel@nongnu.org; Mon, 02 Jan 2012 17:38:12 -0500 Received: by wibhm2 with SMTP id hm2so9970529wib.4 for ; Mon, 02 Jan 2012 14:38:11 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20120102153959.GA22823@lst.de> References: <1313152395-25248-1-git-send-email-morita.kazutaka@lab.ntt.co.jp> <20111223133850.GA12770@lst.de> <20111229120626.GA32331@lst.de> <20111230103500.GA1740@stefanha-thinkpad.localdomain> <20120102153959.GA22823@lst.de> Date: Mon, 2 Jan 2012 22:38:11 +0000 Message-ID: From: Stefan Hajnoczi Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] coroutine bug?, was Re: [PATCH] sheepdog: use coroutines List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Christoph Hellwig Cc: kwolf@redhat.com, sheepdog@lists.wpkg.org, qemu-devel@nongnu.org, MORITA Kazutaka On Mon, Jan 2, 2012 at 3:39 PM, Christoph Hellwig wrote: > On Fri, Dec 30, 2011 at 10:35:01AM +0000, Stefan Hajnoczi wrote: >> If you can reproduce this bug and suspect coroutines are involved then I > > It's entirely reproducable. > > I've played around a bit and switched from the ucontext to the gthreads > coroutine implementation. =A0The result seems odd, but starts to make sen= se. > > Running the workload I now get the following message from qemu: > > Co-routine re-entered recursively > > and the gdb backtrace looks like: > > (gdb) bt > #0 =A00x00007f2fff36f405 in *__GI_raise (sig=3D) > =A0 =A0at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 > #1 =A00x00007f2fff372680 in *__GI_abort () at abort.c:92 > #2 =A00x00007f30019a6616 in qemu_coroutine_enter (co=3D0x7f3004d4d7b0, op= aque=3D0x0) > =A0 =A0at qemu-coroutine.c:53 > #3 =A00x00007f30019a5e82 in qemu_co_queue_next_bh (opaque=3D) > =A0 =A0at qemu-coroutine-lock.c:43 > #4 =A00x00007f30018d5a72 in qemu_bh_poll () at async.c:71 > #5 =A00x00007f3001982990 in main_loop_wait (nonblocking=3D= ) > =A0 =A0at main-loop.c:472 > #6 =A00x00007f30018cf714 in main_loop () at /home/hch/work/qemu/vl.c:1481 > #7 =A0main (argc=3D, argv=3D, envp=3D) > =A0 =A0at /home/hch/work/qemu/vl.c:3479 > > adding some printks suggest this happens when calling add_aio_request fro= m > aio_read_response when either delaying creates, or updating metadata, > although not everytime one of these cases happens. > > I've tried to understand how the recursive calling happens, but unfortuna= tely > the whole coroutine code lacks any sort of documentation how it should > behave or what it asserts about the callers. > >> I don't have a sheepdog setup here but if there's an easy way to >> reproduce please let me know and I'll take a look. > > With the small patch below applied to the sheppdog source I can reproduce > the issue on my laptop using the following setup: > > for port in 7000 7001 7002; do > =A0 =A0mkdir -p /mnt/sheepdog/$port > =A0 =A0/usr/sbin/sheep -p $port -c local /mnt/sheepdog/$port > =A0 =A0sleep 2 > done > > collie cluster format > collie vdi create test 20G > > then start a qemu instance that uses the the sheepdog volume using the > following device and drive lines: > > =A0 =A0 =A0 =A0-drive if=3Dnone,file=3Dsheepdog:test,cache=3Dnone,id=3Dte= st \ > =A0 =A0 =A0 =A0-device virtio-blk-pci,drive=3Dtest,id=3Dtestdev \ > > finally, in the guest run: > > =A0 =A0 =A0 =A0dd if=3D/dev/zero of=3D/dev/vdX bs=3D67108864 count=3D128 = oflag=3Ddirect Thanks for these instructions. I can reproduce the issue here. It seems suspicious the way that BDRVSheepdogState->co_recv and ->co_send work. The code adds select(2) read/write callback functions on the sheepdog socket file descriptor. When the socket becomes writeable or readable the co_send or co_recv coroutines are entered. So far, so good, this is how a coroutine is integrated into the main loop of QEMU. The problem is that this patch is mixing things. The co_recv coroutine runs aio_read_response(), which invokes send_pending_req(). send_pending_req() invokes add_aio_request(). That function isn't suitable for co_recv's context because it actually sends data and hits a few blocking (yield) points. It takes a coroutine mutex - but the select(2) read callback is still in place. We're now still in the aio_read_response() call chain except we're actually not reading at all, we're trying to write! And we'll get spurious wakeups if there is any data readable on the socket. So the co_recv coroutine has two things in the system that will try to ente= r it: 1. The select(2) read callback on the sheepdog socket. 2. The aio_add_request() blocking operations, including a coroutine mutex. This is bad, a yielded coroutine should only have one thing that will enter it. It's rare that it makes sense to have multiple things entering a coroutine. It's late here but I hope this gives Kazutaka some thoughts on what is causing the issue with this patch. Stefan