* [Qemu-devel] [PATCH v2 1/4] block: count bdrv_co_rw_vmstate() requests
2017-05-19 10:32 [Qemu-devel] [PATCH v2 0/4] block: fix 'savevm' hang with -object iothread Stefan Hajnoczi
@ 2017-05-19 10:32 ` Stefan Hajnoczi
2017-05-19 10:32 ` [Qemu-devel] [PATCH v2 2/4] block: use BDRV_POLL_WHILE() in bdrv_rw_vmstate() Stefan Hajnoczi
` (2 subsequent siblings)
3 siblings, 0 replies; 7+ messages in thread
From: Stefan Hajnoczi @ 2017-05-19 10:32 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-block, Paolo Bonzini, Kevin Wolf, Fam Zheng, Stefan Hajnoczi
Call bdrv_inc/dec_in_flight() for vmstate reads/writes. This seems
unnecessary at first glance because vmstate reads/writes are done
synchronously while the guest is stopped. But we need the bdrv_wakeup()
in bdrv_dec_in_flight() so the main loop sees request completion.
Besides, it's cleaner to count vmstate reads/writes like ordinary
read/write requests.
The bdrv_wakeup() partially fixes a 'savevm' hang with -object iothread.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
block/io.c | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/block/io.c b/block/io.c
index fdd7485..cc56e90 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1988,17 +1988,24 @@ bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
bool is_read)
{
BlockDriver *drv = bs->drv;
+ int ret = -ENOTSUP;
+
+ bdrv_inc_in_flight(bs);
if (!drv) {
- return -ENOMEDIUM;
+ ret = -ENOMEDIUM;
} else if (drv->bdrv_load_vmstate) {
- return is_read ? drv->bdrv_load_vmstate(bs, qiov, pos)
- : drv->bdrv_save_vmstate(bs, qiov, pos);
+ if (is_read) {
+ ret = drv->bdrv_load_vmstate(bs, qiov, pos);
+ } else {
+ ret = drv->bdrv_save_vmstate(bs, qiov, pos);
+ }
} else if (bs->file) {
- return bdrv_co_rw_vmstate(bs->file->bs, qiov, pos, is_read);
+ ret = bdrv_co_rw_vmstate(bs->file->bs, qiov, pos, is_read);
}
- return -ENOTSUP;
+ bdrv_dec_in_flight(bs);
+ return ret;
}
static void coroutine_fn bdrv_co_rw_vmstate_entry(void *opaque)
--
2.9.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [Qemu-devel] [PATCH v2 2/4] block: use BDRV_POLL_WHILE() in bdrv_rw_vmstate()
2017-05-19 10:32 [Qemu-devel] [PATCH v2 0/4] block: fix 'savevm' hang with -object iothread Stefan Hajnoczi
2017-05-19 10:32 ` [Qemu-devel] [PATCH v2 1/4] block: count bdrv_co_rw_vmstate() requests Stefan Hajnoczi
@ 2017-05-19 10:32 ` Stefan Hajnoczi
2017-05-19 10:32 ` [Qemu-devel] [PATCH v2 3/4] migration: avoid recursive AioContext locking in save_vmstate() Stefan Hajnoczi
2017-05-19 10:32 ` [Qemu-devel] [PATCH v2 4/4] migration: use bdrv_drain_all_begin/end() instead bdrv_drain_all() Stefan Hajnoczi
3 siblings, 0 replies; 7+ messages in thread
From: Stefan Hajnoczi @ 2017-05-19 10:32 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-block, Paolo Bonzini, Kevin Wolf, Fam Zheng, Stefan Hajnoczi
Calling aio_poll() directly may have been fine previously, but this is
the future, man! The difference between an aio_poll() loop and
BDRV_POLL_WHILE() is that BDRV_POLL_WHILE() releases the AioContext
around aio_poll().
This allows the IOThread to run fd handlers or BHs to complete the
request. Failure to release the AioContext causes deadlocks.
Using BDRV_POLL_WHILE() partially fixes a 'savevm' hang with -object
iothread.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
block/io.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/block/io.c b/block/io.c
index cc56e90..f0041cd 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2031,9 +2031,7 @@ bdrv_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
Coroutine *co = qemu_coroutine_create(bdrv_co_rw_vmstate_entry, &data);
bdrv_coroutine_enter(bs, co);
- while (data.ret == -EINPROGRESS) {
- aio_poll(bdrv_get_aio_context(bs), true);
- }
+ BDRV_POLL_WHILE(bs, data.ret == -EINPROGRESS);
return data.ret;
}
}
--
2.9.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [Qemu-devel] [PATCH v2 3/4] migration: avoid recursive AioContext locking in save_vmstate()
2017-05-19 10:32 [Qemu-devel] [PATCH v2 0/4] block: fix 'savevm' hang with -object iothread Stefan Hajnoczi
2017-05-19 10:32 ` [Qemu-devel] [PATCH v2 1/4] block: count bdrv_co_rw_vmstate() requests Stefan Hajnoczi
2017-05-19 10:32 ` [Qemu-devel] [PATCH v2 2/4] block: use BDRV_POLL_WHILE() in bdrv_rw_vmstate() Stefan Hajnoczi
@ 2017-05-19 10:32 ` Stefan Hajnoczi
2017-05-19 10:32 ` [Qemu-devel] [PATCH v2 4/4] migration: use bdrv_drain_all_begin/end() instead bdrv_drain_all() Stefan Hajnoczi
3 siblings, 0 replies; 7+ messages in thread
From: Stefan Hajnoczi @ 2017-05-19 10:32 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-block, Paolo Bonzini, Kevin Wolf, Fam Zheng, Stefan Hajnoczi
AioContext was designed to allow nested acquire/release calls. It uses
a recursive mutex so callers don't need to worry about nesting...or so
we thought.
BDRV_POLL_WHILE() is used to wait for block I/O requests. It releases
the AioContext temporarily around aio_poll(). This gives IOThreads a
chance to acquire the AioContext to process I/O completions.
It turns out that recursive locking and BDRV_POLL_WHILE() don't mix.
BDRV_POLL_WHILE() only releases the AioContext once, so the IOThread
will not be able to acquire the AioContext if it was acquired
multiple times.
Instead of trying to release AioContext n times in BDRV_POLL_WHILE(),
this patch simply avoids nested locking in save_vmstate(). It's the
simplest fix and we should step back to consider the big picture with
all the recent changes to block layer threading.
This patch is the final fix to solve 'savevm' hanging with -object
iothread.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
migration/savevm.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/migration/savevm.c b/migration/savevm.c
index f5e8194..3ca319f 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2150,6 +2150,14 @@ int save_vmstate(const char *name, Error **errp)
goto the_end;
}
+ /* The bdrv_all_create_snapshot() call that follows acquires the AioContext
+ * for itself. BDRV_POLL_WHILE() does not support nested locking because
+ * it only releases the lock once. Therefore synchronous I/O will deadlock
+ * unless we release the AioContext before bdrv_all_create_snapshot().
+ */
+ aio_context_release(aio_context);
+ aio_context = NULL;
+
ret = bdrv_all_create_snapshot(sn, bs, vm_state_size, &bs);
if (ret < 0) {
error_setg(errp, "Error while creating snapshot on '%s'",
@@ -2160,7 +2168,9 @@ int save_vmstate(const char *name, Error **errp)
ret = 0;
the_end:
- aio_context_release(aio_context);
+ if (aio_context) {
+ aio_context_release(aio_context);
+ }
if (saved_vm_running) {
vm_start();
}
--
2.9.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [Qemu-devel] [PATCH v2 4/4] migration: use bdrv_drain_all_begin/end() instead bdrv_drain_all()
2017-05-19 10:32 [Qemu-devel] [PATCH v2 0/4] block: fix 'savevm' hang with -object iothread Stefan Hajnoczi
` (2 preceding siblings ...)
2017-05-19 10:32 ` [Qemu-devel] [PATCH v2 3/4] migration: avoid recursive AioContext locking in save_vmstate() Stefan Hajnoczi
@ 2017-05-19 10:32 ` Stefan Hajnoczi
2017-05-22 12:17 ` Kevin Wolf
3 siblings, 1 reply; 7+ messages in thread
From: Stefan Hajnoczi @ 2017-05-19 10:32 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-block, Paolo Bonzini, Kevin Wolf, Fam Zheng, Stefan Hajnoczi
blk/bdrv_drain_all() only takes effect for a single instant and then
resumes block jobs, guest devices, and other external clients like the
NBD server. This can be handy when performing a synchronous drain
before terminating the program, for example.
Monitor commands usually need to quiesce I/O across an entire code
region so blk/bdrv_drain_all() is not suitable. They must use
bdrv_drain_all_begin/end() to mark the region. This prevents new I/O
requests from slipping in or worse - block jobs completing and modifying
the graph.
I audited other blk/bdrv_drain_all() callers but did not find anything
that needs a similar fix. This patch fixes the savevm/loadvm commands.
Although I haven't encountered a read world issue this makes the code
safer.
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
migration/savevm.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/migration/savevm.c b/migration/savevm.c
index 3ca319f..f134e48 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2113,6 +2113,8 @@ int save_vmstate(const char *name, Error **errp)
}
vm_stop(RUN_STATE_SAVE_VM);
+ bdrv_drain_all_begin();
+
aio_context_acquire(aio_context);
memset(sn, 0, sizeof(*sn));
@@ -2171,6 +2173,9 @@ int save_vmstate(const char *name, Error **errp)
if (aio_context) {
aio_context_release(aio_context);
}
+
+ bdrv_drain_all_end();
+
if (saved_vm_running) {
vm_start();
}
@@ -2279,7 +2284,7 @@ int load_vmstate(const char *name, Error **errp)
}
/* Flush all IO requests so they don't interfere with the new state. */
- bdrv_drain_all();
+ bdrv_drain_all_begin();
ret = bdrv_all_goto_snapshot(name, &bs);
if (ret < 0) {
@@ -2303,6 +2308,8 @@ int load_vmstate(const char *name, Error **errp)
qemu_fclose(f);
aio_context_release(aio_context);
+ bdrv_drain_all_end();
+
migration_incoming_state_destroy();
if (ret < 0) {
error_setg(errp, "Error %d while loading VM state", ret);
--
2.9.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH v2 4/4] migration: use bdrv_drain_all_begin/end() instead bdrv_drain_all()
2017-05-19 10:32 ` [Qemu-devel] [PATCH v2 4/4] migration: use bdrv_drain_all_begin/end() instead bdrv_drain_all() Stefan Hajnoczi
@ 2017-05-22 12:17 ` Kevin Wolf
2017-05-22 13:39 ` Stefan Hajnoczi
0 siblings, 1 reply; 7+ messages in thread
From: Kevin Wolf @ 2017-05-22 12:17 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: qemu-devel, qemu-block, Paolo Bonzini, Fam Zheng
Am 19.05.2017 um 12:32 hat Stefan Hajnoczi geschrieben:
> blk/bdrv_drain_all() only takes effect for a single instant and then
> resumes block jobs, guest devices, and other external clients like the
> NBD server. This can be handy when performing a synchronous drain
> before terminating the program, for example.
>
> Monitor commands usually need to quiesce I/O across an entire code
> region so blk/bdrv_drain_all() is not suitable. They must use
> bdrv_drain_all_begin/end() to mark the region. This prevents new I/O
> requests from slipping in or worse - block jobs completing and modifying
> the graph.
>
> I audited other blk/bdrv_drain_all() callers but did not find anything
> that needs a similar fix. This patch fixes the savevm/loadvm commands.
> Although I haven't encountered a read world issue this makes the code
> safer.
>
> Suggested-by: Kevin Wolf <kwolf@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> @@ -2279,7 +2284,7 @@ int load_vmstate(const char *name, Error **errp)
> }
>
> /* Flush all IO requests so they don't interfere with the new state. */
> - bdrv_drain_all();
> + bdrv_drain_all_begin();
>
> ret = bdrv_all_goto_snapshot(name, &bs);
> if (ret < 0) {
> @@ -2303,6 +2308,8 @@ int load_vmstate(const char *name, Error **errp)
> qemu_fclose(f);
> aio_context_release(aio_context);
>
> + bdrv_drain_all_end();
> +
> migration_incoming_state_destroy();
> if (ret < 0) {
> error_setg(errp, "Error %d while loading VM state", ret);
There are a few error returns between these two places where a
bdrv_drain_all_end() is missing.
Kevin
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH v2 4/4] migration: use bdrv_drain_all_begin/end() instead bdrv_drain_all()
2017-05-22 12:17 ` Kevin Wolf
@ 2017-05-22 13:39 ` Stefan Hajnoczi
0 siblings, 0 replies; 7+ messages in thread
From: Stefan Hajnoczi @ 2017-05-22 13:39 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-devel, qemu-block, Paolo Bonzini, Fam Zheng
[-- Attachment #1: Type: text/plain, Size: 1889 bytes --]
On Mon, May 22, 2017 at 02:17:35PM +0200, Kevin Wolf wrote:
> Am 19.05.2017 um 12:32 hat Stefan Hajnoczi geschrieben:
> > blk/bdrv_drain_all() only takes effect for a single instant and then
> > resumes block jobs, guest devices, and other external clients like the
> > NBD server. This can be handy when performing a synchronous drain
> > before terminating the program, for example.
> >
> > Monitor commands usually need to quiesce I/O across an entire code
> > region so blk/bdrv_drain_all() is not suitable. They must use
> > bdrv_drain_all_begin/end() to mark the region. This prevents new I/O
> > requests from slipping in or worse - block jobs completing and modifying
> > the graph.
> >
> > I audited other blk/bdrv_drain_all() callers but did not find anything
> > that needs a similar fix. This patch fixes the savevm/loadvm commands.
> > Although I haven't encountered a read world issue this makes the code
> > safer.
> >
> > Suggested-by: Kevin Wolf <kwolf@redhat.com>
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>
> > @@ -2279,7 +2284,7 @@ int load_vmstate(const char *name, Error **errp)
> > }
> >
> > /* Flush all IO requests so they don't interfere with the new state. */
> > - bdrv_drain_all();
> > + bdrv_drain_all_begin();
> >
> > ret = bdrv_all_goto_snapshot(name, &bs);
> > if (ret < 0) {
> > @@ -2303,6 +2308,8 @@ int load_vmstate(const char *name, Error **errp)
> > qemu_fclose(f);
> > aio_context_release(aio_context);
> >
> > + bdrv_drain_all_end();
> > +
> > migration_incoming_state_destroy();
> > if (ret < 0) {
> > error_setg(errp, "Error %d while loading VM state", ret);
>
> There are a few error returns between these two places where a
> bdrv_drain_all_end() is missing.
Thank you, will fix in the next revision.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread