From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:49724)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1Uahxw-0005ut-AN
	for qemu-devel@nongnu.org; Fri, 10 May 2013 03:42:09 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1Uahxu-0005gj-QR
	for qemu-devel@nongnu.org; Fri, 10 May 2013 03:42:08 -0400
Received: from mail-ye0-f171.google.com ([209.85.213.171]:56473)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1Uahxu-0005ge-Ll
	for qemu-devel@nongnu.org; Fri, 10 May 2013 03:42:06 -0400
Received: by mail-ye0-f171.google.com with SMTP id l12so848610yen.2
	for <qemu-devel@nongnu.org>; Fri, 10 May 2013 00:42:06 -0700 (PDT)
Sender: Paolo Bonzini <paolo.bonzini@gmail.com>
Message-ID: <518CA4C1.2010007@redhat.com>
Date: Fri, 10 May 2013 09:41:53 +0200
From: Paolo Bonzini <pbonzini@redhat.com>
MIME-Version: 1.0
References: <1368128600-30721-1-git-send-email-chegu_vinod@hp.com>
	<1368128600-30721-4-git-send-email-chegu_vinod@hp.com>
In-Reply-To: <1368128600-30721-4-git-send-email-chegu_vinod@hp.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [RFC PATCH v5 3/3] Force auto-convegence of live
	migration
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Chegu Vinod <chegu_vinod@hp.com>
Cc: owasserm@redhat.com, qemu-devel@nongnu.org, anthony@codemonkey.ws, quintela@redhat.com

Il 09/05/2013 21:43, Chegu Vinod ha scritto:
>  If a user chooses to turn on the auto-converge migration capability
>  these changes detect the lack of convergence and throttle down the
>  guest. i.e. force the VCPUs out of the guest for some duration
>  and let the migration thread catchup and help converge.
> 
>  Verified the convergence using the following :
>  - SpecJbb2005 workload running on a 20VCPU/256G guest(~80% busy)
>  - OLTP like workload running on a 80VCPU/512G guest (~80% busy)
> 
>  Sample results with SpecJbb2005 workload : (migrate speed set to 20Gb and
>  migrate downtime set to 4seconds).
> 
>  (qemu) info migrate
>  capabilities: xbzrle: off auto-converge: off  <----
>  Migration status: active
>  total time: 1487503 milliseconds
>  expected downtime: 519 milliseconds
>  transferred ram: 383749347 kbytes
>  remaining ram: 2753372 kbytes
>  total ram: 268444224 kbytes
>  duplicate: 65461532 pages
>  skipped: 64901568 pages
>  normal: 95750218 pages
>  normal bytes: 383000872 kbytes
>  dirty pages rate: 67551 pages
> 
>  ---
>  
>  (qemu) info migrate
>  capabilities: xbzrle: off auto-converge: on   <----
>  Migration status: completed
>  total time: 241161 milliseconds
>  downtime: 6373 milliseconds
>  transferred ram: 28235307 kbytes
>  remaining ram: 0 kbytes
>  total ram: 268444224 kbytes
>  duplicate: 64946416 pages
>  skipped: 64903523 pages
>  normal: 7044971 pages
>  normal bytes: 28179884 kbytes

Almost there, and certainly much better than the previous patches.

Just a couple of comments.

> Signed-off-by: Chegu Vinod <chegu_vinod@hp.com>
> ---
>  arch_init.c                   |   68 +++++++++++++++++++++++++++++++++++++++++
>  include/migration/migration.h |    4 ++
>  migration.c                   |    1 +
>  3 files changed, 73 insertions(+), 0 deletions(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index 49c5dc2..29788d6 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -49,6 +49,7 @@
>  #include "trace.h"
>  #include "exec/cpu-all.h"
>  #include "hw/acpi/acpi.h"
> +#include "sysemu/cpus.h"
>  
>  #ifdef DEBUG_ARCH_INIT
>  #define DPRINTF(fmt, ...) \
> @@ -104,6 +105,8 @@ int graphic_depth = 15;
>  #endif
>  
>  const uint32_t arch_type = QEMU_ARCH;
> +static bool mig_throttle_on;
> +
>  
>  /***********************************************************/
>  /* ram save/restore */
> @@ -378,8 +381,15 @@ static void migration_bitmap_sync(void)
>      uint64_t num_dirty_pages_init = migration_dirty_pages;
>      MigrationState *s = migrate_get_current();
>      static int64_t start_time;
> +    static int64_t bytes_xfer_prev;
>      static int64_t num_dirty_pages_period;
>      int64_t end_time;
> +    int64_t bytes_xfer_now;
> +    static int dirty_rate_high_cnt;
> +
> +    if (!bytes_xfer_prev) {
> +        bytes_xfer_prev = ram_bytes_transferred();
> +    }
>  
>      if (!start_time) {
>          start_time = qemu_get_clock_ms(rt_clock);
> @@ -404,6 +414,23 @@ static void migration_bitmap_sync(void)
>  
>      /* more than 1 second = 1000 millisecons */
>      if (end_time > start_time + 1000) {
> +        if (migrate_auto_converge()) {
> +            /* The following detection logic can be refined later. For now:
> +               Check to see if the dirtied bytes is 50% more than the approx.
> +               amount of bytes that just got transferred since the last time we
> +               were in this routine. If that happens N times (for now N==5)
> +               we turn on the throttle down logic */
> +            bytes_xfer_now = ram_bytes_transferred();
> +            if (s->dirty_pages_rate &&
> +                ((num_dirty_pages_period*TARGET_PAGE_SIZE) >
> +                ((bytes_xfer_now - bytes_xfer_prev)/2))) {
> +                if (dirty_rate_high_cnt++ > 5) {
> +                    DPRINTF("Unable to converge. Throtting down guest\n");
> +                    mig_throttle_on = true;
> +                }
> +             }
> +             bytes_xfer_prev = bytes_xfer_now;
> +        }
>          s->dirty_pages_rate = num_dirty_pages_period * 1000
>              / (end_time - start_time);
>          s->dirty_bytes_rate = s->dirty_pages_rate * TARGET_PAGE_SIZE;
> @@ -496,6 +523,15 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
>      return bytes_sent;
>  }
>  
> +bool throttling_needed(void)
> +{
> +    if (!migrate_auto_converge()) {
> +        return false;
> +    }

Also return false if !runstate_is_running() please.

> +    return mig_throttle_on;
> +}
> +
>  static uint64_t bytes_transferred;
>  
>  static ram_addr_t ram_save_remaining(void)
> @@ -1098,3 +1134,35 @@ TargetInfo *qmp_query_target(Error **errp)
>  
>      return info;
>  }
> +
> +static void mig_delay_vcpu(void)
> +{
> +    qemu_mutex_unlock_iothread();
> +    g_usleep(50*1000);
> +    qemu_mutex_lock_iothread();
> +}
> +
> +/* Stub used for getting the vcpu out of VM and into qemu via
> +   run_on_cpu()*/
> +static void mig_kick_cpu(void *opq)
> +{
> +    mig_delay_vcpu();
> +    return;
> +}

Just inline mig_delay_vcpu in here, delete the extra return, and call
this function mig_delay_vcpu

> +/* To reduce the dirty rate explicitly disallow the VCPUs from spending
> +   much time in the VM. The migration thread will try to catchup.
> +   Workload will experience a performance drop.
> +*/
> +void migration_throttle_down(void)
> +{
> +    if (throttling_needed()) {
> +        CPUArchState *penv = first_cpu;
> +        while (penv) {
> +            qemu_mutex_lock_iothread();
> +            async_run_on_cpu(ENV_GET_CPU(penv), mig_kick_cpu, NULL);
> +            qemu_mutex_unlock_iothread();

Pleas hoist the lock/unlock outside the while loop.

> +            penv = penv->next_cpu;
> +        }
> +    }
> +}
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index ace91b0..68b65c6 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -129,4 +129,8 @@ int64_t migrate_xbzrle_cache_size(void);
>  int64_t xbzrle_cache_resize(int64_t new_size);
>  
>  bool migrate_auto_converge(void);
> +bool throttling_needed(void);
> +void stop_throttling(void);
> +void migration_throttle_down(void);
> +
>  #endif
> diff --git a/migration.c b/migration.c
> index 570cee5..d3673a6 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -526,6 +526,7 @@ static void *migration_thread(void *opaque)
>              DPRINTF("pending size %lu max %lu\n", pending_size, max_size);
>              if (pending_size && pending_size >= max_size) {
>                  qemu_savevm_state_iterate(s->file);
> +                migration_throttle_down();

Did you try the approach of calling migration_throttle_down from
ram_save_iterate, based on how much time passed from the last occurrence?

I would like that a bit more because in principle (especially with large
bandwidth) qemu_savevm_state_iterate() can take a long time, thus the
"duty cycle" of the auto-convergence is not predictable.

Paolo

>              } else {
>                  DPRINTF("done iterating\n");
>                  qemu_mutex_lock_iothread();
>