* [Qemu-devel] [PATCH v3] blockjob: Fix hang in block_job_finish_sync
@ 2016-01-29 2:19 Fam Zheng
2016-01-29 11:31 ` Stefan Hajnoczi
0 siblings, 1 reply; 4+ messages in thread
From: Fam Zheng @ 2016-01-29 2:19 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, qemu-block, Jeff Cody, mreitz, stefanha, pbonzini,
jsnow
With a mirror job running on a virtio-blk dataplane disk, sending "q" to
HMP will cause a dead loop in block_job_finish_sync.
This is because the aio_poll() only processes the AIO context of bs
which has no more work to do, while the main loop BH that is scheduled
for setting the job->completed flag is never processed.
Fix this by adding a flag in BlockJob structure, to track which context
to poll for the block job to make progress. Its value is set to true
when block_job_coroutine_complete() is called, and is checked in
block_job_finish_sync to determine which context to poll.
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
---
blockjob.c | 5 ++++-
include/block/blockjob.h | 9 +++++++++
2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/blockjob.c b/blockjob.c
index 80adb9d..25e1581 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -304,7 +304,9 @@ static int block_job_finish_sync(BlockJob *job,
return -EBUSY;
}
while (!job->completed) {
- aio_poll(bdrv_get_aio_context(bs), true);
+ aio_poll(job->deferred_to_main_loop ? qemu_get_aio_context() :
+ bdrv_get_aio_context(bs),
+ true);
}
ret = (job->cancelled && job->ret == 0) ? -ECANCELED : job->ret;
block_job_unref(job);
@@ -497,6 +499,7 @@ void block_job_defer_to_main_loop(BlockJob *job,
data->aio_context = bdrv_get_aio_context(job->bs);
data->fn = fn;
data->opaque = opaque;
+ job->deferred_to_main_loop = true;
qemu_bh_schedule(data->bh);
}
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index d84ccd8..550de26 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -130,6 +130,11 @@ struct BlockJob {
*/
bool ready;
+ /**
+ * Set to true when the job has deferred work to the main loop.
+ */
+ bool deferred_to_main_loop;
+
/** Status that is published by the query-block-jobs QMP API */
BlockDeviceIoStatus iostatus;
@@ -402,6 +407,10 @@ typedef void BlockJobDeferToMainLoopFn(BlockJob *job, void *opaque);
* AioContext acquired. Block jobs must call bdrv_unref(), bdrv_close(), and
* anything that uses bdrv_drain_all() in the main loop.
*
+ * The job->deferred_to_main_loop flag will be set. Caller must clear it once
+ * the deferred work is done and the block job coroutine continues, unless it's
+ * completing immediately.
+ *
* The @job AioContext is held while @fn executes.
*/
void block_job_defer_to_main_loop(BlockJob *job,
--
2.4.3
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] [PATCH v3] blockjob: Fix hang in block_job_finish_sync
2016-01-29 2:19 [Qemu-devel] [PATCH v3] blockjob: Fix hang in block_job_finish_sync Fam Zheng
@ 2016-01-29 11:31 ` Stefan Hajnoczi
2016-02-01 2:49 ` Fam Zheng
0 siblings, 1 reply; 4+ messages in thread
From: Stefan Hajnoczi @ 2016-01-29 11:31 UTC (permalink / raw)
To: Fam Zheng
Cc: Kevin Wolf, qemu-block, Jeff Cody, qemu-devel, mreitz, pbonzini,
jsnow
[-- Attachment #1: Type: text/plain, Size: 1215 bytes --]
On Fri, Jan 29, 2016 at 10:19:49AM +0800, Fam Zheng wrote:
> @@ -402,6 +407,10 @@ typedef void BlockJobDeferToMainLoopFn(BlockJob *job, void *opaque);
> * AioContext acquired. Block jobs must call bdrv_unref(), bdrv_close(), and
> * anything that uses bdrv_drain_all() in the main loop.
> *
> + * The job->deferred_to_main_loop flag will be set. Caller must clear it once
> + * the deferred work is done and the block job coroutine continues, unless it's
> + * completing immediately.
> + *
It's not necessary to expose job->deferred_to_main_loop to the user.
Just clear it:
static void block_job_defer_to_main_loop_bh(void *opaque)
{
BlockJobDeferToMainLoopData *data = opaque;
AioContext *aio_context;
qemu_bh_delete(data->bh);
/* Prevent race with block_job_defer_to_main_loop() */
aio_context_acquire(data->aio_context);
/* Fetch BDS AioContext again, in case it has changed */
aio_context = bdrv_get_aio_context(data->job->bs);
aio_context_acquire(aio_context);
data->fn(data->job, data->opaque);
job->deferred_to_main_loop = false; /* <----- HERE */
aio_context_release(aio_context);
aio_context_release(data->aio_context);
g_free(data);
}
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] [PATCH v3] blockjob: Fix hang in block_job_finish_sync
2016-01-29 11:31 ` Stefan Hajnoczi
@ 2016-02-01 2:49 ` Fam Zheng
2016-02-01 11:36 ` Stefan Hajnoczi
0 siblings, 1 reply; 4+ messages in thread
From: Fam Zheng @ 2016-02-01 2:49 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Kevin Wolf, qemu-block, Jeff Cody, qemu-devel, mreitz, pbonzini,
jsnow
On Fri, 01/29 11:31, Stefan Hajnoczi wrote:
> On Fri, Jan 29, 2016 at 10:19:49AM +0800, Fam Zheng wrote:
> > @@ -402,6 +407,10 @@ typedef void BlockJobDeferToMainLoopFn(BlockJob *job, void *opaque);
> > * AioContext acquired. Block jobs must call bdrv_unref(), bdrv_close(), and
> > * anything that uses bdrv_drain_all() in the main loop.
> > *
> > + * The job->deferred_to_main_loop flag will be set. Caller must clear it once
> > + * the deferred work is done and the block job coroutine continues, unless it's
> > + * completing immediately.
> > + *
>
> It's not necessary to expose job->deferred_to_main_loop to the user.
> Just clear it:
>
> static void block_job_defer_to_main_loop_bh(void *opaque)
> {
> BlockJobDeferToMainLoopData *data = opaque;
> AioContext *aio_context;
>
> qemu_bh_delete(data->bh);
>
> /* Prevent race with block_job_defer_to_main_loop() */
> aio_context_acquire(data->aio_context);
>
> /* Fetch BDS AioContext again, in case it has changed */
> aio_context = bdrv_get_aio_context(data->job->bs);
> aio_context_acquire(aio_context);
>
> data->fn(data->job, data->opaque);
> job->deferred_to_main_loop = false; /* <----- HERE */
Maybe move one line above in case data->fn() does another
block_job_defer_to_main_loop()?
Fam
>
> aio_context_release(aio_context);
>
> aio_context_release(data->aio_context);
>
> g_free(data);
> }
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] [PATCH v3] blockjob: Fix hang in block_job_finish_sync
2016-02-01 2:49 ` Fam Zheng
@ 2016-02-01 11:36 ` Stefan Hajnoczi
0 siblings, 0 replies; 4+ messages in thread
From: Stefan Hajnoczi @ 2016-02-01 11:36 UTC (permalink / raw)
To: Fam Zheng
Cc: Kevin Wolf, qemu-block, Jeff Cody, qemu-devel, mreitz, pbonzini,
jsnow
[-- Attachment #1: Type: text/plain, Size: 1591 bytes --]
On Mon, Feb 01, 2016 at 10:49:00AM +0800, Fam Zheng wrote:
> On Fri, 01/29 11:31, Stefan Hajnoczi wrote:
> > On Fri, Jan 29, 2016 at 10:19:49AM +0800, Fam Zheng wrote:
> > > @@ -402,6 +407,10 @@ typedef void BlockJobDeferToMainLoopFn(BlockJob *job, void *opaque);
> > > * AioContext acquired. Block jobs must call bdrv_unref(), bdrv_close(), and
> > > * anything that uses bdrv_drain_all() in the main loop.
> > > *
> > > + * The job->deferred_to_main_loop flag will be set. Caller must clear it once
> > > + * the deferred work is done and the block job coroutine continues, unless it's
> > > + * completing immediately.
> > > + *
> >
> > It's not necessary to expose job->deferred_to_main_loop to the user.
> > Just clear it:
> >
> > static void block_job_defer_to_main_loop_bh(void *opaque)
> > {
> > BlockJobDeferToMainLoopData *data = opaque;
> > AioContext *aio_context;
> >
> > qemu_bh_delete(data->bh);
> >
> > /* Prevent race with block_job_defer_to_main_loop() */
> > aio_context_acquire(data->aio_context);
> >
> > /* Fetch BDS AioContext again, in case it has changed */
> > aio_context = bdrv_get_aio_context(data->job->bs);
> > aio_context_acquire(aio_context);
> >
> > data->fn(data->job, data->opaque);
> > job->deferred_to_main_loop = false; /* <----- HERE */
>
> Maybe move one line above in case data->fn() does another
> block_job_defer_to_main_loop()?
Yes, good point. Thanks for spotting the bug.
It's safe to clear the boolean as soon as we acquire aio_context.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2016-02-01 11:36 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-29 2:19 [Qemu-devel] [PATCH v3] blockjob: Fix hang in block_job_finish_sync Fam Zheng
2016-01-29 11:31 ` Stefan Hajnoczi
2016-02-01 2:49 ` Fam Zheng
2016-02-01 11:36 ` Stefan Hajnoczi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).