From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58005) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b2gI4-0001kY-9W for qemu-devel@nongnu.org; Tue, 17 May 2016 10:48:09 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1b2gI1-0007sM-99 for qemu-devel@nongnu.org; Tue, 17 May 2016 10:48:07 -0400 Date: Tue, 17 May 2016 16:47:57 +0200 From: Kevin Wolf Message-ID: <20160517144757.GB9802@noname.redhat.com> References: <20160429151826.GM4350@noname.redhat.com> <20160503132324.GE3917@noname.str.redhat.com> <20160503134847.GH3917@noname.str.redhat.com> <20160512150451.GF4794@noname.redhat.com> <20160512152834.GG4794@noname.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Subject: Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alberto Garcia Cc: Max Reitz , qemu-devel@nongnu.org, qemu-block@nongnu.org, Eric Blake , Stefan Hajnoczi Am 17.05.2016 um 16:26 hat Alberto Garcia geschrieben: > On Thu 12 May 2016 05:28:34 PM CEST, Kevin Wolf wrote: > > Am 12.05.2016 um 17:13 hat Alberto Garcia geschrieben: > >> On Thu 12 May 2016 05:04:51 PM CEST, Kevin Wolf wrote: > >> > Am 12.05.2016 um 15:47 hat Alberto Garcia geschrieben: > >> >> On Tue 03 May 2016 03:48:47 PM CEST, Kevin Wolf wrote: > >> >> > Am 03.05.2016 um 15:33 hat Alberto Garcia geschrieben: > >> >> >> On Tue 03 May 2016 03:23:24 PM CEST, Kevin Wolf wrote: > >> >> >> >> c) we fix bdrv_reopen() so we can actually run both jobs at the same > >> >> >> >> time. I'm wondering if pausing all block jobs between > >> >> >> >> bdrv_reopen_prepare() and bdrv_reopen_commit() would do the > >> >> >> >> trick. Opinions? > >> >> >> > > >> >> >> > I would have to read up the details of the problem again, but I think > >> >> >> > with bdrv_drained_begin/end() we actually have the right tool now to fix > >> >> >> > it properly. We may need to pull up the drain (bdrv_drain_all() today) > >> >> >> > from bdrv_reopen_multiple() to its caller and just assert it in the > >> >> >> > function itself, but there shouldn't be much more to it than that. > >> >> >> > >> >> >> I think that's not enough, see point 2) here: > >> >> >> > >> >> >> https://lists.gnu.org/archive/html/qemu-block/2015-12/msg00180.html > >> >> >> > >> >> >> "I've been taking a look at the bdrv_drained_begin/end() API, but as I > >> >> >> understand it it prevents requests from a different AioContext. > >> >> >> Since all BDS in the same chain share the same context it does not > >> >> >> really help here." > >> >> > > >> >> > Yes, that's the part I meant with pulling up the calls. > >> >> > > >> >> > If I understand correctly, the problem is that first bdrv_reopen_queue() > >> >> > queues a few BDSes for reopen, then bdrv_drain_all() completes all > >> >> > running requests and can indirectly trigger a graph modification, and > >> >> > then bdrv_reopen_multiple() uses the queue which doesn't match reality > >> >> > any more. > >> >> > > >> >> > The solution to that should be simply changing the order of things: > >> >> > > >> >> > 1. bdrv_drained_begin() > >> >> > 2. bdrv_reopen_queue() > >> >> > 3. bdrv_reopen_multiple() > >> >> > * Instead of bdrv_drain_all(), assert that no requests are pending > >> >> > * We don't run requests, so we can't complete a block job and > >> >> > manipulate the graph any more > >> >> > 4. then bdrv_drained_end() > >> >> > >> >> This doesn't work. Here's what happens: > >> >> > >> >> 1) Block job (a) starts (block-stream). > >> >> > >> >> 2) Block job (b) starts (block-stream, or block-commit). > >> >> > >> >> 3) job (b) calls bdrv_reopen() and does the drain call. > >> >> > >> >> 4) job (b) creates reopen_queue and calls bdrv_reopen_multiple(). > >> >> There are no pending requests at this point, but job (a) is sleeping. > >> >> > >> >> 5) bdrv_reopen_multiple() iterates over reopen_queue and calls > >> >> bdrv_reopen_prepare() -> bdrv_flush() -> bdrv_co_flush() -> > >> >> qemu_coroutine_yield(). > >> > > >> > I think between here and the next step is what I don't understand. > >> > > >> > bdrv_reopen_multiple() is not called in coroutine context, right? All > >> > block jobs use block_job_defer_to_main_loop() before they call > >> > bdrv_reopen(), as far as I can see. So bdrv_flush() shouldn't take the > >> > shortcut, but use a nested event loop. > >> > >> When bdrv_flush() is not called in coroutine context it does > >> qemu_coroutine_create() + qemu_coroutine_enter(). > > > > Right, but if the coroutine yields, we jump back to the caller, which > > looks like this: > > > > co = qemu_coroutine_create(bdrv_flush_co_entry); > > qemu_coroutine_enter(co, &rwco); > > while (rwco.ret == NOT_DONE) { > > aio_poll(aio_context, true); > > } > > > > So this loops until the flush has completed. The only way I can see how > > something else (job (a)) can run is if aio_poll() calls it. > > If I'm not wrong that can happen if job (a) is sleeping. > > If job (a) is launched with a rate limit, then the bdrv_drain() call > will not drain it completely but return when the block job hits the I/O > limit -> block_job_sleep_ns() -> co_aio_sleep_ns(). So timers. Currently, bdrv_drained_begin/end disables the FD handlers, but not timers. Not sure if that's easily fixable in a generic way, though, so maybe we need to disable timers one by one. Pausing a block job does not actually cause the timer to be cancelled. Maybe we need to change block_job_sleep_ns() so that the function returns only when the job isn't paused any more. Maybe something like: job->busy = false; if (!block_job_is_paused(job)) { co_aio_sleep_ns(bdrv_get_aio_context(job->bs), type, ns); /* The job could be paused now */ } if (block_job_is_paused(job)) { qemu_coroutine_yield(); } job->busy = true; Then it should be safe to assume that if block jobs are paused (even if they were sleeping), they won't issue new requests any more until they are resumed. > >> > I don't fully understand the problem yet, but as a shot in the > >> > dark, would pausing block jobs in bdrv_drained_begin() help? > >> > >> Yeah, my impression is that pausing all jobs during bdrv_reopen() > >> should be enough. > > > > If you base your patches on top of my queue (which I think you already > > do for the greatest part), the nicest way to implement this would > > probably be that BlockBackends give their users a callback for > > .drained_begin/end and the jobs implement that as pausing themselves. > > > > We could, of course, directly pause block jobs in > > bdrv_drained_begin(), but that would feel a bit hackish. So maybe do > > that for a quick attempt whether it helps, and if it does, we can > > write the real thing then. > > I'll try that. > > On a related note, bdrv_drain_all() already pauses all block jobs. The > problem is that when it ends it resumes them, so it can happen that > there's again pending I/O requests before bdrv_drain_all() returns. > > That sounds like a problem to me, or am I overlooking anything? Essentially every bdrv_drain(_all) call is a problem, it should always be a bdrv_drained_begin/end section. The only exception is where you want to drain only requests from a single source that you know doesn't send new requests (e.g. draining guest requests in vm_stop). Kevin