Re: [PATCH v2] migration: Handle block device inactivation failures better

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Juan Quintela <quintela@redhat.com>
To: Eric Blake <eblake@redhat.com>
Cc: qemu-devel@nongnu.org
Subject: Re: [PATCH v2] migration: Handle block device inactivation failures better
Date: Thu, 20 Apr 2023 12:41:25 +0200	[thread overview]
Message-ID: <878remzvhm.fsf@secure.mitica> (raw)
In-Reply-To: <20230414153358.1452040-1-eblake@redhat.com> (Eric Blake's message of "Fri, 14 Apr 2023 10:33:58 -0500")

Eric Blake <eblake@redhat.com> wrote:
> Consider what happens when performing a migration between two host
> machines connected to an NFS server serving multiple block devices to
> the guest, when the NFS server becomes unavailable.  The migration
> attempts to inactivate all block devices on the source (a necessary
> step before the destination can take over); but if the NFS server is
> non-responsive, the attempt to inactivate can itself fail.  When that
> happens, the destination fails to get the migrated guest (good,
> because the source wasn't able to flush everything properly):
>
>   (qemu) qemu-kvm: load of migration failed: Input/output error
>
> at which point, our only hope for the guest is for the source to take
> back control.  With the current code base, the host outputs a message, but then appears to resume:
>
>   (qemu) qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable: bdrv_inactivate_all() failed (-1)
>
>   (src qemu)info status
>    VM status: running
>
> but a second migration attempt now asserts:
>
>   (src qemu) qemu-kvm: ../block.c:6738: int bdrv_inactivate_recurse(BlockDriverState *): Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed.
>
> Whether the guest is recoverable on the source after the first failure
> is debatable, but what we do not want is to have qemu itself fail due
> to an assertion.  It looks like the problem is as follows:
>
> In migration.c:migration_completion(), the source sets 'inactivate' to
> true (since COLO is not enabled), then tries
> savevm.c:qemu_savevm_state_complete_precopy() with a request to
> inactivate block devices.  In turn, this calls
> block.c:bdrv_inactivate_all(), which fails when flushing runs up
> against the non-responsive NFS server.  With savevm failing, we are
> now left in a state where some, but not all, of the block devices have
> been inactivated; but migration_completion() then jumps to 'fail'
> rather than 'fail_invalidate' and skips an attempt to reclaim those
> those disks by calling bdrv_activate_all().  Even if we do attempt to
> reclaim disks, we aren't taking note of failure there, either.
>
> Thus, we have reached a state where the migration engine has forgotten
> all state about whether a block device is inactive, because we did not
> set s->block_inactive in enough places; so migration allows the source
> to reach vm_start() and resume execution, violating the block layer
> invariant that the guest CPUs should not be restarted while a device
> is inactive.  Note that the code in migration.c:migrate_fd_cancel()
> will also try to reactivate all block devices if s->block_inactive was
> set, but because we failed to set that flag after the first failure,
> the source assumes it has reclaimed all devices, even though it still
> has remaining inactivated devices and does not try again.  Normally,
> qmp_cont() will also try to reactivate all disks (or correctly fail if
> the disks are not reclaimable because NFS is not yet back up), but the
> auto-resumption of the source after a migration failure does not go
> through qmp_cont().  And because we have left the block layer in an
> inconsistent state with devices still inactivated, the later migration
> attempt is hitting the assertion failure.
>
> Since it is important to not resume the source with inactive disks,
> this patch marks s->block_inactive before attempting inactivation,
> rather than after succeeding, in order to prevent any vm_start() until
> it has successfully reactivated all devices.
>
> See also https://bugzilla.redhat.com/show_bug.cgi?id=2058982
>
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

queued.

>
> ---
>
> v2: Set s->block_inactive sooner [Juan]
> ---
>  migration/migration.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index bda47891933..cb0d42c0610 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3444,13 +3444,11 @@ static void migration_completion(MigrationState *s)
>                                              MIGRATION_STATUS_DEVICE);
>              }
>              if (ret >= 0) {
> +                s->block_inactive = inactivate;
>                  qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX);
>                  ret = qemu_savevm_state_complete_precopy(s->to_dst_file, false,
>                                                           inactivate);
>              }
> -            if (inactivate && ret >= 0) {
> -                s->block_inactive = true;
> -            }
>          }
>          qemu_mutex_unlock_iothread();

And I still have to look at the file to understand this "simple" patch.
(simple in size, not in what it means).

I will add this to my queue, but if you are in the "mood", I would like
to remove the declaration of inactivate and change this to something like:

             if (ret >= 0) {
                 /* Colo don't stop disks in normal operation */
                 s->block_inactive = !migrate_colo_enabled();
                 qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX);
                 ret = qemu_savevm_state_complete_precopy(s->to_dst_file, false,
                                                          s->block_inactive);
             }

Or something around that lines?

> @@ -3522,6 +3520,7 @@ fail_invalidate:
>          bdrv_activate_all(&local_err);
>          if (local_err) {
>              error_report_err(local_err);
> +            s->block_inactive = true;
>          } else {
>              s->block_inactive = false;
>          }
> base-commit: 7dbd6f8a27e30fe14adb3d5869097cddf24038d6

Just wondering, what git magic creates this line?

Thanks, Juan.

next prev parent reply	other threads:[~2023-04-20 10:42 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-14 15:33 [PATCH v2] migration: Handle block device inactivation failures better Eric Blake
2023-04-20  8:28 ` Lukas Straub
2023-04-20 10:41 ` Juan Quintela [this message]
2023-04-20 14:25   ` Eric Blake
2023-04-25  8:53     ` git-format-patch useAutoBase (was Re: [PATCH v2] migration: Handle block ... ) Juan Quintela
2023-05-02  9:54 ` [PATCH v2] migration: Handle block device inactivation failures better Kevin Wolf
2023-05-02 20:16   ` Eric Blake

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878remzvhm.fsf@secure.mitica \
    --to=quintela@redhat.com \
    --cc=eblake@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).