From: Paolo Bonzini <pbonzini@redhat.com>
To: Stefan Hajnoczi <stefanha@redhat.com>, Fam Zheng <famz@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
lvivier@redhat.com, qemu-devel@nongnu.org, qemu-block@nongnu.org
Subject: Re: [Qemu-devel] [PATCH v2] block: Fix bdrv_drain in coroutine
Date: Mon, 4 Apr 2016 16:47:56 +0200 [thread overview]
Message-ID: <57027E9C.9060101@redhat.com> (raw)
In-Reply-To: <20160404115734.GA10964@stefanha-x1.localdomain>
On 04/04/2016 13:57, Stefan Hajnoczi wrote:
> On Fri, Apr 01, 2016 at 09:57:38PM +0800, Fam Zheng wrote:
>> Using the nested aio_poll() in coroutine is a bad idea. This patch
>> replaces the aio_poll loop in bdrv_drain with a BH, if called in
>> coroutine.
>>
>> For example, the bdrv_drain() in mirror.c can hang when a guest issued
>> request is pending on it in qemu_co_mutex_lock().
>>
>> Mirror coroutine in this case has just finished a request, and the block
>> job is about to complete. It calls bdrv_drain() which waits for the
>> other coroutine to complete. The other coroutine is a scsi-disk request.
>> The deadlock happens when the latter is in turn pending on the former to
>> yield/terminate, in qemu_co_mutex_lock(). The state flow is as below
>> (assuming a qcow2 image):
>>
>> mirror coroutine scsi-disk coroutine
>> -------------------------------------------------------------
>> do last write
>>
>> qcow2:qemu_co_mutex_lock()
>> ...
>> scsi disk read
>>
>> tracked request begin
>>
>> qcow2:qemu_co_mutex_lock.enter
>>
>> qcow2:qemu_co_mutex_unlock()
>>
>> bdrv_drain
>> while (has tracked request)
>> aio_poll()
>>
>> In the scsi-disk coroutine, the qemu_co_mutex_lock() will never return
>> because the mirror coroutine is blocked in the aio_poll(blocking=true).
>>
>> With this patch, the added qemu_coroutine_yield() allows the scsi-disk
>> coroutine to make progress as expected:
>>
>> mirror coroutine scsi-disk coroutine
>> -------------------------------------------------------------
>> do last write
>>
>> qcow2:qemu_co_mutex_lock()
>> ...
>> scsi disk read
>>
>> tracked request begin
>>
>> qcow2:qemu_co_mutex_lock.enter
>>
>> qcow2:qemu_co_mutex_unlock()
>>
>> bdrv_drain.enter
>>> schedule BH
>>> qemu_coroutine_yield()
>>> qcow2:qemu_co_mutex_lock.return
>>> ...
>> tracked request end
>> ...
>> (resumed from BH callback)
>> bdrv_drain.return
>> ...
>>
>> Reported-by: Laurent Vivier <lvivier@redhat.com>
>> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
>> Signed-off-by: Fam Zheng <famz@redhat.com>
>>
>> ---
>>
>> v2: Call qemu_bh_delete() in BH callback. [Paolo]
>> Change the loop to an assertion. [Paolo]
>> Elaborate a bit about the fix in commit log. [Paolo]
>> ---
>> block/io.c | 41 +++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 41 insertions(+)
>>
>> diff --git a/block/io.c b/block/io.c
>> index c4869b9..ddcfb4c 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -253,6 +253,43 @@ static void bdrv_drain_recurse(BlockDriverState *bs)
>> }
>> }
>>
>> +typedef struct {
>> + Coroutine *co;
>> + BlockDriverState *bs;
>> + QEMUBH *bh;
>> + bool done;
>> +} BdrvCoDrainData;
>> +
>> +static void bdrv_co_drain_bh_cb(void *opaque)
>> +{
>> + BdrvCoDrainData *data = opaque;
>> + Coroutine *co = data->co;
>> +
>> + qemu_bh_delete(data->bh);
>> + bdrv_drain(data->bs);
>> + data->done = true;
>> + qemu_coroutine_enter(co, NULL);
>> +}
>> +
>> +static void coroutine_fn bdrv_co_drain(BlockDriverState *bs)
>> +{
>
> Please document why a BH is needed:
>
> /* Calling bdrv_drain() from a BH ensures the
> * current coroutine yields and other coroutines run if they were
> * queued from qemu_co_queue_run_restart().
> */
>
>> + BdrvCoDrainData data;
>> +
>> + assert(qemu_in_coroutine());
>> + data = (BdrvCoDrainData) {
>> + .co = qemu_coroutine_self(),
>> + .bs = bs,
>> + .done = false,
>> + .bh = aio_bh_new(bdrv_get_aio_context(bs), bdrv_co_drain_bh_cb, &data),
>> + };
>> + qemu_bh_schedule(data.bh);
>> +
>> + qemu_coroutine_yield();
>> + /* If we are resumed from some other event (such as an aio completion or a
>> + * timer callback), it is a bug in the caller that should be fixed. */
>> + assert(data.done);
>> +}
>> +
>> /*
>> * Wait for pending requests to complete on a single BlockDriverState subtree,
>> * and suspend block driver's internal I/O until next request arrives.
>> @@ -269,6 +306,10 @@ void bdrv_drain(BlockDriverState *bs)
>> bool busy = true;
>>
>> bdrv_drain_recurse(bs);
>> + if (qemu_in_coroutine()) {
>> + bdrv_co_drain(bs);
>> + return;
>> + }
>> while (busy) {
>> /* Keep iterating */
>> bdrv_flush_io_queue(bs);
>> --
>> 2.7.4
>
> block/mirror.c should call bdrv_co_drain() explicitly and bdrv_drain()
> should assert(!qemu_in_coroutine()).
>
> The reason why existing bdrv_read() and friends detect coroutine context
> at runtime is because it is meant for legacy code that runs in both
> coroutine and non-coroutine contexts.
>
> Modern coroutine code coroutine code the coroutine interfaces explicitly
> instead.
For what it's worth, I suspect Fam's patch removes the need for
http://article.gmane.org/gmane.comp.emulators.qemu/401375. That's a
nice bonus. :)
Paolo
next prev parent reply other threads:[~2016-04-04 14:48 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-01 13:57 [Qemu-devel] [PATCH v2] block: Fix bdrv_drain in coroutine Fam Zheng
2016-04-01 14:14 ` Laurent Vivier
2016-04-04 11:57 ` Stefan Hajnoczi
2016-04-04 14:47 ` Paolo Bonzini [this message]
2016-04-05 1:27 ` Fam Zheng
2016-04-05 9:39 ` Stefan Hajnoczi
2016-04-05 11:15 ` Fam Zheng
2016-04-05 12:39 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57027E9C.9060101@redhat.com \
--to=pbonzini@redhat.com \
--cc=famz@redhat.com \
--cc=kwolf@redhat.com \
--cc=lvivier@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.