From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45927) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Wnr0S-0001sn-1p for qemu-devel@nongnu.org; Fri, 23 May 2014 11:03:45 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Wnr0G-0006TC-Ib for qemu-devel@nongnu.org; Fri, 23 May 2014 11:03:36 -0400 Received: from mail-wi0-x233.google.com ([2a00:1450:400c:c05::233]:49123) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Wnr0G-0006Sf-Cc for qemu-devel@nongnu.org; Fri, 23 May 2014 11:03:24 -0400 Received: by mail-wi0-f179.google.com with SMTP id bs8so1025785wib.12 for ; Fri, 23 May 2014 08:03:23 -0700 (PDT) Date: Fri, 23 May 2014 14:58:47 +0200 From: Stefan Hajnoczi Message-ID: <20140523125847.GD5990@stefanha-thinkpad.hitronhub.home> References: <537E62CE.2050302@beyond.pl> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <537E62CE.2050302@beyond.pl> Subject: Re: [Qemu-devel] qemu 2.0, deadlock in block-commit List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Marcin =?utf-8?Q?Gibu=C5=82a?= Cc: "qemu-devel@nongnu.org" On Thu, May 22, 2014 at 10:49:18PM +0200, Marcin GibuĊ‚a wrote: > This is backtrace of qemu process: > > (gdb) thread apply all backtrace [...] a bunch of rbd threads, vnc worker thread, QEMU worker threads > Thread 1 (Thread 0x7f699bfcd900 (LWP 13647)): > #0 0x00007f6998020286 in ppoll () from /lib64/libc.so.6 > #1 0x00007f699c1f3d9b in ppoll (__ss=0x0, __timeout=0x0, __nfds= out>, __fds=) at /usr/include/bits/poll2.h:77 > #2 qemu_poll_ns (fds=, nfds=, > timeout=) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qemu-timer.c:311 > #3 0x00007f699c0877e0 in aio_poll (ctx=0x7f699e4c9c00, > blocking=blocking@entry=true) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/aio-posix.c:221 > #4 0x00007f699c095c0a in bdrv_drain_all () at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:1805 QEMU is waiting for all block I/O requests to complete. I wonder if there is some weird interaction with rbd here, which is why this never completes. In gdb you can iterate over bdrv_states to inspect the open BlockDriverState structs. Each BDS struct has a tracked_requests list and bdrv_drain_all() is waiting for this pending requests list to become empty. If you see a pending request on a RADOS block device (rbd) then it would be good to dig deeper into QEMU's block/rbd.c driver to see why it's not completing that request. Are you using qcow2 on top of rbd? > #5 0x00007f699c09c87e in bdrv_close (bs=bs@entry=0x7f699f0bc520) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:1695 > #6 0x00007f699c09c5fa in bdrv_delete (bs=0x7f699f0bc520) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:1978 > #7 bdrv_unref (bs=0x7f699f0bc520) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:5198 > #8 0x00007f699c09c812 in bdrv_drop_intermediate > (active=active@entry=0x7f699ebfd330, top=top@entry=0x7f699f0bc520, > base=base@entry=0x7f699eec43d0) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:2567 > #9 0x00007f699c0a1963 in commit_run (opaque=0x7f699f17dcc0) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block/commit.c:144 > #10 0x00007f699c0e0dca in coroutine_trampoline (i0=, > i1=) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/coroutine-ucontext.c:118 > #11 0x00007f6997f859f0 in ?? () from /lib64/libc.so.6 > #12 0x00007fffdbe06750 in ?? () > #13 0x0000000000000000 in ?? ()