From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:38371)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1b0sXd-0005Iz-KJ
	for qemu-devel@nongnu.org; Thu, 12 May 2016 11:28:47 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1b0sXa-000129-QK
	for qemu-devel@nongnu.org; Thu, 12 May 2016 11:28:44 -0400
Date: Thu, 12 May 2016 17:28:34 +0200
From: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20160512152834.GG4794@noname.redhat.com>
References: <5720BFDB.60600@redhat.com>
	<w51h9elapr2.fsf@maestria.local.igalia.com>
	<20160429151826.GM4350@noname.redhat.com>
	<w517ffb486l.fsf@maestria.local.igalia.com>
	<20160503132324.GE3917@noname.str.redhat.com>
	<w51y47r2rmm.fsf@maestria.local.igalia.com>
	<20160503134847.GH3917@noname.str.redhat.com>
	<w517fezo0al.fsf@maestria.local.igalia.com>
	<20160512150451.GF4794@noname.redhat.com>
	<w514ma3nwbl.fsf@maestria.local.igalia.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <w514ma3nwbl.fsf@maestria.local.igalia.com>
Subject: Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for
 streaming to an intermediate layer
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Alberto Garcia <berto@igalia.com>
Cc: Max Reitz <mreitz@redhat.com>, qemu-devel@nongnu.org, qemu-block@nongnu.org, Eric Blake <eblake@redhat.com>, Stefan Hajnoczi <stefanha@redhat.com>

Am 12.05.2016 um 17:13 hat Alberto Garcia geschrieben:
> On Thu 12 May 2016 05:04:51 PM CEST, Kevin Wolf wrote:
> > Am 12.05.2016 um 15:47 hat Alberto Garcia geschrieben:
> >> On Tue 03 May 2016 03:48:47 PM CEST, Kevin Wolf wrote:
> >> > Am 03.05.2016 um 15:33 hat Alberto Garcia geschrieben:
> >> >> On Tue 03 May 2016 03:23:24 PM CEST, Kevin Wolf wrote:
> >> >> >> c) we fix bdrv_reopen() so we can actually run both jobs at the same
> >> >> >>    time. I'm wondering if pausing all block jobs between
> >> >> >>    bdrv_reopen_prepare() and bdrv_reopen_commit() would do the
> >> >> >>    trick. Opinions?
> >> >> >
> >> >> > I would have to read up the details of the problem again, but I think
> >> >> > with bdrv_drained_begin/end() we actually have the right tool now to fix
> >> >> > it properly. We may need to pull up the drain (bdrv_drain_all() today)
> >> >> > from bdrv_reopen_multiple() to its caller and just assert it in the
> >> >> > function itself, but there shouldn't be much more to it than that.
> >> >> 
> >> >> I think that's not enough, see point 2) here:
> >> >> 
> >> >> https://lists.gnu.org/archive/html/qemu-block/2015-12/msg00180.html
> >> >> 
> >> >>   "I've been taking a look at the bdrv_drained_begin/end() API, but as I
> >> >>    understand it it prevents requests from a different AioContext.
> >> >>    Since all BDS in the same chain share the same context it does not
> >> >>    really help here."
> >> >
> >> > Yes, that's the part I meant with pulling up the calls.
> >> >
> >> > If I understand correctly, the problem is that first bdrv_reopen_queue()
> >> > queues a few BDSes for reopen, then bdrv_drain_all() completes all
> >> > running requests and can indirectly trigger a graph modification, and
> >> > then bdrv_reopen_multiple() uses the queue which doesn't match reality
> >> > any more.
> >> >
> >> > The solution to that should be simply changing the order of things:
> >> >
> >> > 1. bdrv_drained_begin()
> >> > 2. bdrv_reopen_queue()
> >> > 3. bdrv_reopen_multiple()
> >> >     * Instead of bdrv_drain_all(), assert that no requests are pending
> >> >     * We don't run requests, so we can't complete a block job and
> >> >       manipulate the graph any more
> >> > 4. then bdrv_drained_end()
> >> 
> >> This doesn't work. Here's what happens:
> >> 
> >> 1) Block job (a) starts (block-stream).
> >> 
> >> 2) Block job (b) starts (block-stream, or block-commit).
> >> 
> >> 3) job (b) calls bdrv_reopen() and does the drain call.
> >> 
> >> 4) job (b) creates reopen_queue and calls bdrv_reopen_multiple().
> >>    There are no pending requests at this point, but job (a) is sleeping.
> >> 
> >> 5) bdrv_reopen_multiple() iterates over reopen_queue and calls
> >>    bdrv_reopen_prepare() -> bdrv_flush() -> bdrv_co_flush() ->
> >>    qemu_coroutine_yield().
> >
> > I think between here and the next step is what I don't understand.
> >
> > bdrv_reopen_multiple() is not called in coroutine context, right? All
> > block jobs use block_job_defer_to_main_loop() before they call
> > bdrv_reopen(), as far as I can see. So bdrv_flush() shouldn't take the
> > shortcut, but use a nested event loop.
> 
> When bdrv_flush() is not called in coroutine context it does
> qemu_coroutine_create() + qemu_coroutine_enter().

Right, but if the coroutine yields, we jump back to the caller, which
looks like this:

    co = qemu_coroutine_create(bdrv_flush_co_entry);
    qemu_coroutine_enter(co, &rwco);
    while (rwco.ret == NOT_DONE) {
        aio_poll(aio_context, true);
    }

So this loops until the flush has completed. The only way I can see how
something else (job (a)) can run is if aio_poll() calls it.

> > What is it that calls into job (a) from that event loop? It can't be a
> > request completion because we already drained all requests. Is it a
> > timer?
> 
> If I didn't get it wrong it's this bit in bdrv_co_flush()
> [...]

That's the place that yields from (b), but it's not the place that calls
into (a).

> >> 6) job (a) resumes, finishes the job and removes nodes from the graph.
> >> 
> >> 7) job (b) continues with bdrv_reopen_multiple() but now reopen_queue
> >>    contains invalid pointers.
> >
> > I don't fully understand the problem yet, but as a shot in the dark,
> > would pausing block jobs in bdrv_drained_begin() help?
> 
> Yeah, my impression is that pausing all jobs during bdrv_reopen() should
> be enough.

If you base your patches on top of my queue (which I think you already
do for the greatest part), the nicest way to implement this would
probably be that BlockBackends give their users a callback for
.drained_begin/end and the jobs implement that as pausing themselves.

We could, of course, directly pause block jobs in bdrv_drained_begin(),
but that would feel a bit hackish. So maybe do that for a quick attempt
whether it helps, and if it does, we can write the real thing then.

Kevin