From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:59877) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1axahl-00083B-7C for qemu-devel@nongnu.org; Tue, 03 May 2016 09:49:43 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1axahZ-00031I-0b for qemu-devel@nongnu.org; Tue, 03 May 2016 09:49:31 -0400 Date: Tue, 3 May 2016 15:48:47 +0200 From: Kevin Wolf Message-ID: <20160503134847.GH3917@noname.str.redhat.com> References: <5720BFDB.60600@redhat.com> <20160429151826.GM4350@noname.redhat.com> <20160503132324.GE3917@noname.str.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Subject: Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alberto Garcia Cc: Max Reitz , qemu-devel@nongnu.org, qemu-block@nongnu.org, Eric Blake , Stefan Hajnoczi Am 03.05.2016 um 15:33 hat Alberto Garcia geschrieben: > On Tue 03 May 2016 03:23:24 PM CEST, Kevin Wolf wrote: > >> c) we fix bdrv_reopen() so we can actually run both jobs at the same > >> time. I'm wondering if pausing all block jobs between > >> bdrv_reopen_prepare() and bdrv_reopen_commit() would do the > >> trick. Opinions? > > > > I would have to read up the details of the problem again, but I think > > with bdrv_drained_begin/end() we actually have the right tool now to fix > > it properly. We may need to pull up the drain (bdrv_drain_all() today) > > from bdrv_reopen_multiple() to its caller and just assert it in the > > function itself, but there shouldn't be much more to it than that. > > I think that's not enough, see point 2) here: > > https://lists.gnu.org/archive/html/qemu-block/2015-12/msg00180.html > > "I've been taking a look at the bdrv_drained_begin/end() API, but as I > understand it it prevents requests from a different AioContext. > Since all BDS in the same chain share the same context it does not > really help here." Yes, that's the part I meant with pulling up the calls. If I understand correctly, the problem is that first bdrv_reopen_queue() queues a few BDSes for reopen, then bdrv_drain_all() completes all running requests and can indirectly trigger a graph modification, and then bdrv_reopen_multiple() uses the queue which doesn't match reality any more. The solution to that should be simply changing the order of things: 1. bdrv_drained_begin() 2. bdrv_reopen_queue() 3. bdrv_reopen_multiple() * Instead of bdrv_drain_all(), assert that no requests are pending * We don't run requests, so we can't complete a block job and manipulate the graph any more 4. then bdrv_drained_end() But you're right that this changed order is really the key and not bdrv_drained_begin/end(). The latter are just cleaner than the existing bdrv_drain_all(), but don't actually contribute much to the fix without dataplane. I must have misremembered its importance. Kevin