From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:59877)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1axahl-00083B-7C
	for qemu-devel@nongnu.org; Tue, 03 May 2016 09:49:43 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1axahZ-00031I-0b
	for qemu-devel@nongnu.org; Tue, 03 May 2016 09:49:31 -0400
Date: Tue, 3 May 2016 15:48:47 +0200
From: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20160503134847.GH3917@noname.str.redhat.com>
References: <cover.1459776815.git.berto@igalia.com>
	<e6fa1fe465be9051316b29668c0a255341b6b1f1.1459776815.git.berto@igalia.com>
	<5720BFDB.60600@redhat.com>
	<w51h9elapr2.fsf@maestria.local.igalia.com>
	<20160429151826.GM4350@noname.redhat.com>
	<w517ffb486l.fsf@maestria.local.igalia.com>
	<20160503132324.GE3917@noname.str.redhat.com>
	<w51y47r2rmm.fsf@maestria.local.igalia.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <w51y47r2rmm.fsf@maestria.local.igalia.com>
Subject: Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for
 streaming to an intermediate layer
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Alberto Garcia <berto@igalia.com>
Cc: Max Reitz <mreitz@redhat.com>, qemu-devel@nongnu.org, qemu-block@nongnu.org, Eric Blake <eblake@redhat.com>, Stefan Hajnoczi <stefanha@redhat.com>

Am 03.05.2016 um 15:33 hat Alberto Garcia geschrieben:
> On Tue 03 May 2016 03:23:24 PM CEST, Kevin Wolf wrote:
> >> c) we fix bdrv_reopen() so we can actually run both jobs at the same
> >>    time. I'm wondering if pausing all block jobs between
> >>    bdrv_reopen_prepare() and bdrv_reopen_commit() would do the
> >>    trick. Opinions?
> >
> > I would have to read up the details of the problem again, but I think
> > with bdrv_drained_begin/end() we actually have the right tool now to fix
> > it properly. We may need to pull up the drain (bdrv_drain_all() today)
> > from bdrv_reopen_multiple() to its caller and just assert it in the
> > function itself, but there shouldn't be much more to it than that.
> 
> I think that's not enough, see point 2) here:
> 
> https://lists.gnu.org/archive/html/qemu-block/2015-12/msg00180.html
> 
>   "I've been taking a look at the bdrv_drained_begin/end() API, but as I
>    understand it it prevents requests from a different AioContext.
>    Since all BDS in the same chain share the same context it does not
>    really help here."

Yes, that's the part I meant with pulling up the calls.

If I understand correctly, the problem is that first bdrv_reopen_queue()
queues a few BDSes for reopen, then bdrv_drain_all() completes all
running requests and can indirectly trigger a graph modification, and
then bdrv_reopen_multiple() uses the queue which doesn't match reality
any more.

The solution to that should be simply changing the order of things:

1. bdrv_drained_begin()
2. bdrv_reopen_queue()
3. bdrv_reopen_multiple()
    * Instead of bdrv_drain_all(), assert that no requests are pending
    * We don't run requests, so we can't complete a block job and
      manipulate the graph any more
4. then bdrv_drained_end()

But you're right that this changed order is really the key and not
bdrv_drained_begin/end(). The latter are just cleaner than the existing
bdrv_drain_all(), but don't actually contribute much to the fix without
dataplane. I must have misremembered its importance.

Kevin