qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Wang, Lei" <lei4.wang@intel.com>
To: Avihai Horon <avihaih@nvidia.com>, qemu-devel@nongnu.org
Cc: Peter Xu <peterx@redhat.com>, Fabiano Rosas <farosas@suse.de>,
	Joao Martins <joao.m.martins@oracle.com>
Subject: Re: [PATCH] migration: Don't serialize migration while can't switchover
Date: Tue, 27 Feb 2024 11:16:38 +0800	[thread overview]
Message-ID: <6fa132c5-8ed8-41a6-a70d-90230ce3ca84@intel.com> (raw)
In-Reply-To: <20240222155627.14563-1-avihaih@nvidia.com>

On 2/22/2024 23:56, Avihai Horon wrote:
> Currently, migration code serializes device data sending during pre-copy
> iterative phase. As noted in the code comment, this is done to prevent
> faster changing device from sending its data over and over.
> 
> However, with switchover-ack capability enabled, this behavior can be
> problematic and may prevent migration from converging. The problem lies
> in the fact that an earlier device may never finish sending its data and
> thus block other devices from sending theirs.
> 
> This bug was observed in several VFIO migration scenarios where some
> workload on the VM prevented RAM from ever reaching a hard zero, not
> allowing VFIO initial pre-copy data to be sent, and thus destination
> could not ack switchover. Note that the same scenario, but without
> switchover-ack, would converge.
> 
> Fix it by not serializing device data sending during pre-copy iterative
> phase if switchover was not acked yet.

Hi Avihai,

Can this bug be solved by ordering the priority of different device's handlers?

> 
> Fixes: 1b4adb10f898 ("migration: Implement switchover ack logic")
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> ---
>  migration/savevm.h    |  2 +-
>  migration/migration.c |  4 ++--
>  migration/savevm.c    | 22 +++++++++++++++-------
>  3 files changed, 18 insertions(+), 10 deletions(-)
> 
> diff --git a/migration/savevm.h b/migration/savevm.h
> index 74669733dd6..d4a368b522b 100644
> --- a/migration/savevm.h
> +++ b/migration/savevm.h
> @@ -36,7 +36,7 @@ void qemu_savevm_state_setup(QEMUFile *f);
>  bool qemu_savevm_state_guest_unplug_pending(void);
>  int qemu_savevm_state_resume_prepare(MigrationState *s);
>  void qemu_savevm_state_header(QEMUFile *f);
> -int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy);
> +int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy, bool can_switchover);
>  void qemu_savevm_state_cleanup(void);
>  void qemu_savevm_state_complete_postcopy(QEMUFile *f);
>  int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only,
> diff --git a/migration/migration.c b/migration/migration.c
> index ab21de2cadb..d8bfe1fb1b9 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3133,7 +3133,7 @@ static MigIterateState migration_iteration_run(MigrationState *s)
>      }
>  
>      /* Just another iteration step */
> -    qemu_savevm_state_iterate(s->to_dst_file, in_postcopy);
> +    qemu_savevm_state_iterate(s->to_dst_file, in_postcopy, can_switchover);
>      return MIG_ITERATE_RESUME;
>  }
>  
> @@ -3216,7 +3216,7 @@ static MigIterateState bg_migration_iteration_run(MigrationState *s)
>  {
>      int res;
>  
> -    res = qemu_savevm_state_iterate(s->to_dst_file, false);
> +    res = qemu_savevm_state_iterate(s->to_dst_file, false, true);
>      if (res > 0) {
>          bg_migration_completion(s);
>          return MIG_ITERATE_BREAK;
> diff --git a/migration/savevm.c b/migration/savevm.c
> index d612c8a9020..3a012796375 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1386,7 +1386,7 @@ int qemu_savevm_state_resume_prepare(MigrationState *s)
>   *   0 : We haven't finished, caller have to go again
>   *   1 : We have finished, we can go to complete phase
>   */
> -int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy)
> +int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy, bool can_switchover)
>  {
>      SaveStateEntry *se;
>      int ret = 1;
> @@ -1430,12 +1430,20 @@ int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy)
>                           "%d(%s): %d",
>                           se->section_id, se->idstr, ret);
>              qemu_file_set_error(f, ret);
> +            return ret;
>          }
> -        if (ret <= 0) {
> -            /* Do not proceed to the next vmstate before this one reported
> -               completion of the current stage. This serializes the migration
> -               and reduces the probability that a faster changing state is
> -               synchronized over and over again. */
> +
> +        if (ret == 0 && can_switchover) {
> +            /*
> +             * Do not proceed to the next vmstate before this one reported
> +             * completion of the current stage. This serializes the migration
> +             * and reduces the probability that a faster changing state is
> +             * synchronized over and over again.
> +             * Do it only if migration can switchover. If migration can't
> +             * switchover yet, do proceed to let other devices send their data
> +             * too, as this may be required for switchover to be acked and
> +             * migration to converge.
> +             */
>              break;
>          }
>      }
> @@ -1724,7 +1732,7 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
>      qemu_savevm_state_setup(f);
>  
>      while (qemu_file_get_error(f) == 0) {
> -        if (qemu_savevm_state_iterate(f, false) > 0) {
> +        if (qemu_savevm_state_iterate(f, false, true) > 0) {
>              break;
>          }
>      }


  reply	other threads:[~2024-02-27  3:17 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-22 15:56 [PATCH] migration: Don't serialize migration while can't switchover Avihai Horon
2024-02-27  3:16 ` Wang, Lei [this message]
2024-02-28  9:56   ` Avihai Horon
2024-02-27  7:41 ` Peter Xu
2024-02-27 10:44   ` Joao Martins
2024-02-28  0:00   ` Avihai Horon
2024-02-28  3:04     ` Peter Xu
2024-02-28  9:39       ` Avihai Horon
2024-02-28 10:17         ` Peter Xu
2024-02-28 10:27           ` Avihai Horon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6fa132c5-8ed8-41a6-a70d-90230ce3ca84@intel.com \
    --to=lei4.wang@intel.com \
    --cc=avihaih@nvidia.com \
    --cc=farosas@suse.de \
    --cc=joao.m.martins@oracle.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).