From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:33871) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1anFmW-0001Ba-1p for qemu-devel@nongnu.org; Mon, 04 Apr 2016 21:27:49 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1anFmV-0000FI-1E for qemu-devel@nongnu.org; Mon, 04 Apr 2016 21:27:48 -0400 Date: Tue, 5 Apr 2016 09:27:39 +0800 From: Fam Zheng Message-ID: <20160405012739.GA670@ad.usersys.redhat.com> References: <1459519058-29864-1-git-send-email-famz@redhat.com> <20160404115734.GA10964@stefanha-x1.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160404115734.GA10964@stefanha-x1.localdomain> Subject: Re: [Qemu-devel] [PATCH v2] block: Fix bdrv_drain in coroutine List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: Kevin Wolf , lvivier@redhat.com, qemu-devel@nongnu.org, qemu-block@nongnu.org, pbonzini@redhat.com On Mon, 04/04 12:57, Stefan Hajnoczi wrote: > On Fri, Apr 01, 2016 at 09:57:38PM +0800, Fam Zheng wrote: > > Using the nested aio_poll() in coroutine is a bad idea. This patch > > replaces the aio_poll loop in bdrv_drain with a BH, if called in > > coroutine. > > > > For example, the bdrv_drain() in mirror.c can hang when a guest issued > > request is pending on it in qemu_co_mutex_lock(). > > > > Mirror coroutine in this case has just finished a request, and the block > > job is about to complete. It calls bdrv_drain() which waits for the > > other coroutine to complete. The other coroutine is a scsi-disk request. > > The deadlock happens when the latter is in turn pending on the former to > > yield/terminate, in qemu_co_mutex_lock(). The state flow is as below > > (assuming a qcow2 image): > > > > mirror coroutine scsi-disk coroutine > > ------------------------------------------------------------- > > do last write > > > > qcow2:qemu_co_mutex_lock() > > ... > > scsi disk read > > > > tracked request begin > > > > qcow2:qemu_co_mutex_lock.enter > > > > qcow2:qemu_co_mutex_unlock() > > > > bdrv_drain > > while (has tracked request) > > aio_poll() > > > > In the scsi-disk coroutine, the qemu_co_mutex_lock() will never return > > because the mirror coroutine is blocked in the aio_poll(blocking=true). > > > > With this patch, the added qemu_coroutine_yield() allows the scsi-disk > > coroutine to make progress as expected: > > > > mirror coroutine scsi-disk coroutine > > ------------------------------------------------------------- > > do last write > > > > qcow2:qemu_co_mutex_lock() > > ... > > scsi disk read > > > > tracked request begin > > > > qcow2:qemu_co_mutex_lock.enter > > > > qcow2:qemu_co_mutex_unlock() > > > > bdrv_drain.enter > > > schedule BH > > > qemu_coroutine_yield() > > > qcow2:qemu_co_mutex_lock.return > > > ... > > tracked request end > > ... > > (resumed from BH callback) > > bdrv_drain.return > > ... > > > > Reported-by: Laurent Vivier > > Suggested-by: Paolo Bonzini > > Signed-off-by: Fam Zheng > > > > --- > > > > v2: Call qemu_bh_delete() in BH callback. [Paolo] > > Change the loop to an assertion. [Paolo] > > Elaborate a bit about the fix in commit log. [Paolo] > > --- > > block/io.c | 41 +++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 41 insertions(+) > > > > diff --git a/block/io.c b/block/io.c > > index c4869b9..ddcfb4c 100644 > > --- a/block/io.c > > +++ b/block/io.c > > @@ -253,6 +253,43 @@ static void bdrv_drain_recurse(BlockDriverState *bs) > > } > > } > > > > +typedef struct { > > + Coroutine *co; > > + BlockDriverState *bs; > > + QEMUBH *bh; > > + bool done; > > +} BdrvCoDrainData; > > + > > +static void bdrv_co_drain_bh_cb(void *opaque) > > +{ > > + BdrvCoDrainData *data = opaque; > > + Coroutine *co = data->co; > > + > > + qemu_bh_delete(data->bh); > > + bdrv_drain(data->bs); > > + data->done = true; > > + qemu_coroutine_enter(co, NULL); > > +} > > + > > +static void coroutine_fn bdrv_co_drain(BlockDriverState *bs) > > +{ > > Please document why a BH is needed: > > /* Calling bdrv_drain() from a BH ensures the > * current coroutine yields and other coroutines run if they were > * queued from qemu_co_queue_run_restart(). > */ OK. > > > + BdrvCoDrainData data; > > + > > + assert(qemu_in_coroutine()); > > + data = (BdrvCoDrainData) { > > + .co = qemu_coroutine_self(), > > + .bs = bs, > > + .done = false, > > + .bh = aio_bh_new(bdrv_get_aio_context(bs), bdrv_co_drain_bh_cb, &data), > > + }; > > + qemu_bh_schedule(data.bh); > > + > > + qemu_coroutine_yield(); > > + /* If we are resumed from some other event (such as an aio completion or a > > + * timer callback), it is a bug in the caller that should be fixed. */ > > + assert(data.done); > > +} > > + > > /* > > * Wait for pending requests to complete on a single BlockDriverState subtree, > > * and suspend block driver's internal I/O until next request arrives. > > @@ -269,6 +306,10 @@ void bdrv_drain(BlockDriverState *bs) > > bool busy = true; > > > > bdrv_drain_recurse(bs); > > + if (qemu_in_coroutine()) { > > + bdrv_co_drain(bs); > > + return; > > + } > > while (busy) { > > /* Keep iterating */ > > bdrv_flush_io_queue(bs); > > -- > > 2.7.4 > > block/mirror.c should call bdrv_co_drain() explicitly and bdrv_drain() > should assert(!qemu_in_coroutine()). > > The reason why existing bdrv_read() and friends detect coroutine context > at runtime is because it is meant for legacy code that runs in both > coroutine and non-coroutine contexts. blk_unref/bdrv_unref are called in coroutine (.bdrv_create implementations), and this doesn't just work with the assertion. Should I clean up this "legacy" code first, i.e. move bdrv_unref calls to BHs in the callers and assert(!qemu_in_coroutine()) there too? I didn't think this because it complicates the code somehow. > > Modern coroutine code coroutine code the coroutine interfaces explicitly > instead.