From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:33397) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WnokY-0008T3-3T for qemu-devel@nongnu.org; Fri, 23 May 2014 08:39:10 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WnokP-0002Qe-Mt for qemu-devel@nongnu.org; Fri, 23 May 2014 08:39:02 -0400 Received: from mx.beyond.pl ([92.43.117.49]:54676) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WnokP-0002QZ-GY for qemu-devel@nongnu.org; Fri, 23 May 2014 08:38:53 -0400 Message-ID: <537F4156.8090105@beyond.pl> Date: Fri, 23 May 2014 14:38:46 +0200 From: =?UTF-8?B?TWFyY2luIEdpYnXFgmE=?= MIME-Version: 1.0 References: <537E62CE.2050302@beyond.pl> <537E66A5.9010609@beyond.pl> <537F047C.9010800@redhat.com> <537F141C.5080102@beyond.pl> <20140523122906.GB5254@noname.redhat.com> In-Reply-To: <20140523122906.GB5254@noname.redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] qemu 2.0, deadlock in block-commit List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: kwolf@redhat.com, Paolo Bonzini > I see that you have a mix of aio=native and aio=threads. I can't say > much about the aio=native disks (perhaps try to reproduce without > them?), but there are definitely no worker threads for the other disks > that bdrv_drain_all() would have to wait for. True. But I/O was being done only qcow2 disk with threads backend. And snapshot was made on this disk. I'll try to reproduce with all 'threads'. > bdrv_requests_pending(), called by bdrv_requests_pending_all(), is the > function that determines for each of the disks in your VM if it still > has requests in flight that need to be completed. This function must > have returned true even though there is nothing to wait for. > > Can you check which of its conditions led to this behaviour, and for > which disk it did? Either by setting a breakpoint there and > singlestepping through the function the next time it is called (if the > poll even has a timeout), or by inspecting the conditions manually in > gdb. I'm on it. -- mg