All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: zhanghailiang <zhang.zhanghailiang@huawei.com>
Cc: quintela@redhat.com, amit.shah@redhat.com, qemu-devel@nongnu.org,
	qemu-block@nongnu.org
Subject: Re: [Qemu-devel] [PATCH] migration: re-active images when migration fails to complete
Date: Tue, 6 Dec 2016 14:42:11 +0100	[thread overview]
Message-ID: <20161206134211.GD4990@noname.str.redhat.com> (raw)
In-Reply-To: <1479555831-30960-1-git-send-email-zhang.zhanghailiang@huawei.com>

Am 19.11.2016 um 12:43 hat zhanghailiang geschrieben:
> commit fe904ea8242cbae2d7e69c052c754b8f5f1ba1d6 fixed a case
> which migration aborted QEMU because it didn't regain the control
> of images while some errors happened.
> 
> Actually, we have another case in that error path to abort QEMU
> because of the same reason:
>     migration_thread()
>         migration_completion()
>            bdrv_inactivate_all() ----------------> inactivate images
>            qemu_savevm_state_complete_precopy()
>                socket_writev_buffer() --------> error because destination fails
>              qemu_fflush() -------------------> set error on migration stream
>            qemu_mutex_unlock_iothread() ------> unlock
>     qmp_migrate_cancel() ---------------------> user cancelled migration
>         migrate_set_state() ------------------> set migrate CANCELLING

Important to note here: qmp_migrate_cancel() is executed by a concurrent
thread, it doesn't depend on any code paths in migration_completion().

>     migration_completion() -----------------> go on to fail_invalidate
>         if (s->state == MIGRATION_STATUS_ACTIVE) -> Jump this branch
>     migration_thread() -----------------------> break migration loop
>       vm_start() -----------------------------> restart guest with inactive
>                                                 images
> We failed to regain the control of images because we only regain it
> while the migration state is "active", but here users cancelled the migration
> when they found some errors happened (for example, libvirtd daemon is shutdown
> in destination unexpectedly).
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> ---
>  migration/migration.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index f498ab8..0c1ee6d 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1752,7 +1752,8 @@ fail_invalidate:
>      /* If not doing postcopy, vm_start() will be called: let's regain
>       * control on images.
>       */
> -    if (s->state == MIGRATION_STATUS_ACTIVE) {

This if condition tries to check whether we ran the code path that
called bdrv_inactivate_all(), so that we only try to reactivate images
it if we really inactivated them first.

The problem with it is that it ignores a possible concurrent
modification of s->state.

> +    if (s->state == MIGRATION_STATUS_ACTIVE ||
> +        s->state == MIGRATION_STATUS_CANCELLING) {

This adds another state that we could end up with with a concurrent
modification, so that even in this case we undo the inactivation.

However, it is no longer limited to the cases where we inactivated the
image. It also applies to other code paths (like the postcopy one) where
we didn't inactivate images.

What saves the patch is that bdrv_invalidate_cache() is a no-op for
block devices that aren't inactivated, so calling it more often than
necessary is okay.

But then, if we're going to rely on this, it would be much better to
just remove the if altogether. I can't say whether there are any other
possible values of s->state that we should consider, and by removing the
if we would be guaranteed to catch all of them.

If we don't want to rely on it, just keep a local bool that remembers
whether we inactivated images and check that here.

>          Error *local_err = NULL;
>  
>          bdrv_invalidate_cache_all(&local_err);

So in summary, this is a horrible patch because it checks the wrong
thing, and for I can't really say if it covers everything it needs to
cover, but arguably it happens to correctly fix the outcome of a
previously failing case.

Normally I would reject such a patch and require a clean solution, but
then we're on the day of -rc3, so if you can't send v2 right away, we
might not have the time for it.

Tough call...

Kevin

  parent reply	other threads:[~2016-12-06 13:42 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-19 11:43 [Qemu-devel] [PATCH] migration: re-active images when migration fails to complete zhanghailiang
2016-11-21 11:12 ` Hailiang Zhang
2016-12-01 14:30 ` Kevin Wolf
2016-12-06 13:42 ` Kevin Wolf [this message]
2016-12-06 14:30   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2016-12-06 15:24   ` [Qemu-devel] " Dr. David Alan Gilbert
2016-12-08  5:35     ` Hailiang Zhang
2016-12-08 20:02       ` Dr. David Alan Gilbert
2016-12-22  2:56         ` Hailiang Zhang
2016-12-27 10:38           ` Hailiang Zhang
2017-01-11  5:22           ` Hailiang Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161206134211.GD4990@noname.str.redhat.com \
    --to=kwolf@redhat.com \
    --cc=amit.shah@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=zhang.zhanghailiang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.