Re: [Qemu-devel] [PATCH] migration: flush the bdrv before stopping VM

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Juan Quintela <quintela@redhat.com>
To: Liang Li <liang.z.li@intel.com>
Cc: amit.shah@redhat.com, yang.z.zhang@intel.com, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH] migration: flush the bdrv before stopping VM
Date: Tue, 17 Mar 2015 13:12:16 +0100	[thread overview]
Message-ID: <87wq2fkelb.fsf@neno.neno> (raw)
In-Reply-To: <1426582438-9698-1-git-send-email-liang.z.li@intel.com> (Liang Li's message of "Tue, 17 Mar 2015 16:53:58 +0800")

Liang Li <liang.z.li@intel.com> wrote:
> If there are file write operations in the guest when doing live
> migration, the VM downtime will be much longer than the max_downtime,
> this is caused by bdrv_flush_all(), this function is a time consuming
> operation if there a lot of data have to be flushed to disk.
>
> By adding bdrv_flush_all() before VM stop, we can reduce the time
> consumed by bdrv_flush_all() in vm_stop_force_state, this means the
> VM down time can be reduced.
>
> The test shows this optimization can help to reduce the VM downtime
> from more than 20 seconds to about 100 milliseconds.
>
> Signed-off-by: Liang Li <liang.z.li@intel.com>


This needs further review/changes on the block layer.

First explanation, why I think this don't fix the full problem.
Whith this patch, we fix the problem where we have a dirty block layer
but basically nothing dirtying the memory on the guest (we are moving
the 20 seconds from max_downtime for the blocklayer flush), to 20
seconds until we have decided that the amount of dirty memory is small
enough to be transferred during max_downtime.  But it is still going to
take 20 seconds to flush the block layer, and during that 20 seconds,
the amount of memory that can be dirty is HUGE.

I think our ouptions are:

- tell the block layer at the beggining of migration
  Hey, we are migrating, could you please start flusing data now, and
  don't get the caches to grow too much, please, pretty please.
  (I left the API to the block layer)
- Add on that point a new function:
  bdrvr_flush_all_start()
  That starts the sending of pages, and we "hope" that by the time that
  we have migrated all memory, they have also finished (so our last
  call to block_flush_all() have less work to do)
- Add another function:
  int bdrv_flush_all_timeout(int timeout)
  that returns if timeout pass, telling if it has migrated all pages or
  timeout has passed.  So we can got back to the iterative stage if it
  has taken too long.

Notice that *normally* bdrv_flush_all() is very fast, the problem is
that sometimes it get really, really slow (NFS decided to go slow, TCP
drop a packet, whatever).

Right now, we don't have an interface to detect that cases and got back
to the iterative stage.

So, I agree whit the diagnosis that there is a problem there, but I
think that the solution is more complex that this.  You helped one load
making a different other worse.  I am not sure which of the two
compromises is better :-(

Makes this sense?

Later, Juan.


> ---
>  migration/migration.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 2c805f1..fc4735c 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -655,6 +655,10 @@ static void *migration_thread(void *opaque)
>                  qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
>                  old_vm_running = runstate_is_running();
>  
> +                /* do flush here is aimed to shorten the VM downtime,
> +                 * bdrv_flush_all is a time consuming operation
> +                 * when the guest has done some file writing */
> +                bdrv_flush_all();
>                  ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
>                  if (ret >= 0) {
>                      qemu_file_set_rate_limit(s->file, INT64_MAX);

next prev parent reply	other threads:[~2015-03-17 12:12 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-17  8:53 [Qemu-devel] [PATCH] migration: flush the bdrv before stopping VM Liang Li
2015-03-17 12:12 ` Juan Quintela [this message]
2015-03-18  3:19   ` Li, Liang Z
2015-03-18 11:17     ` Kevin Wolf
2015-03-18 12:36       ` Juan Quintela
2015-03-18 12:59         ` Paolo Bonzini
2015-03-18 13:42         ` Kevin Wolf
2015-03-20  7:22         ` Li, Liang Z
2015-03-25 10:50           ` Juan Quintela
2015-03-25 10:53             ` Kevin Wolf
2015-03-26  1:13               ` Li, Liang Z
2015-06-24 11:08               ` Li, Liang Z
2015-06-25 12:34                 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2015-03-18 13:39       ` [Qemu-devel] " Li, Liang Z
2015-03-18 16:55         ` Dr. David Alan Gilbert
2015-03-19 14:06           ` Li, Liang Z
2015-03-19 14:40             ` Dr. David Alan Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wq2fkelb.fsf@neno.neno \
    --to=quintela@redhat.com \
    --cc=amit.shah@redhat.com \
    --cc=liang.z.li@intel.com \
    --cc=qemu-devel@nongnu.org \
    --cc=yang.z.zhang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.