From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45068) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZgVS5-00033c-VO for qemu-devel@nongnu.org; Mon, 28 Sep 2015 06:14:34 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZgVS4-0001hm-WF for qemu-devel@nongnu.org; Mon, 28 Sep 2015 06:14:33 -0400 References: <1438868176-20364-1-git-send-email-pbonzini@redhat.com> <1438868176-20364-2-git-send-email-pbonzini@redhat.com> <20150928095026.GB8756@stefanha-thinkpad.redhat.com> From: Paolo Bonzini Message-ID: <56091300.4070000@redhat.com> Date: Mon, 28 Sep 2015 12:14:24 +0200 MIME-Version: 1.0 In-Reply-To: <20150928095026.GB8756@stefanha-thinkpad.redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 01/18] iothread: release iothread around aio_poll List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: kwolf@redhat.com, famz@redhat.com, qemu-devel@nongnu.org, qemu-block@nongnu.org On 28/09/2015 11:50, Stefan Hajnoczi wrote: > On Thu, Aug 06, 2015 at 03:35:59PM +0200, Paolo Bonzini wrote: >> This is the first step towards having fine-grained critical sections in >> dataplane threads, which resolves lock ordering problems between >> address_space_* functions (which need the BQL when doing MMIO, even >> after we complete RCU-based dispatch) and the AioContext. >> >> Because AioContext does not use contention callbacks anymore, the >> unit test has to be changed. >> >> Previously applied as a0710f7995f914e3044e5899bd8ff6c43c62f916 and >> then reverted. > > commit da5e1de95bb235330d7724316e7a29239d1359d5 > Author: Stefan Hajnoczi > Date: Wed Jun 3 10:15:33 2015 +0100 > > Revert "iothread: release iothread around aio_poll" > > This reverts commit a0710f7995f914e3044e5899bd8ff6c43c62f916. > > In qemu-devel email message <556DBF87.2020908@de.ibm.com>, Christian > Borntraeger writes: > > Having many guests all with a kernel/ramdisk (via -kernel) and > several null block devices will result in hangs. All hanging > guests are in partition detection code waiting for an I/O to return > so very early maybe even the first I/O. > > Reverting that commit "fixes" the hangs. > > Reverting this commit for the 2.4 release. More time is needed to > investigate and correct this patch. > > Did we ever find the root cause for hangs caused by this patch? It was fixed by commit 53ec73e ("block: Use bdrv_drain to replace uncessary bdrv_drain_all", 2015-05-29)'s change to bdrv_set_aio_context. We never investigated the root cause, but I'd guess it's gone after the 2.4-rc bugfixes to AioContext. Paolo