From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:49724) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Uahxw-0005ut-AN for qemu-devel@nongnu.org; Fri, 10 May 2013 03:42:09 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Uahxu-0005gj-QR for qemu-devel@nongnu.org; Fri, 10 May 2013 03:42:08 -0400 Received: from mail-ye0-f171.google.com ([209.85.213.171]:56473) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Uahxu-0005ge-Ll for qemu-devel@nongnu.org; Fri, 10 May 2013 03:42:06 -0400 Received: by mail-ye0-f171.google.com with SMTP id l12so848610yen.2 for ; Fri, 10 May 2013 00:42:06 -0700 (PDT) Sender: Paolo Bonzini Message-ID: <518CA4C1.2010007@redhat.com> Date: Fri, 10 May 2013 09:41:53 +0200 From: Paolo Bonzini MIME-Version: 1.0 References: <1368128600-30721-1-git-send-email-chegu_vinod@hp.com> <1368128600-30721-4-git-send-email-chegu_vinod@hp.com> In-Reply-To: <1368128600-30721-4-git-send-email-chegu_vinod@hp.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC PATCH v5 3/3] Force auto-convegence of live migration List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Chegu Vinod Cc: owasserm@redhat.com, qemu-devel@nongnu.org, anthony@codemonkey.ws, quintela@redhat.com Il 09/05/2013 21:43, Chegu Vinod ha scritto: > If a user chooses to turn on the auto-converge migration capability > these changes detect the lack of convergence and throttle down the > guest. i.e. force the VCPUs out of the guest for some duration > and let the migration thread catchup and help converge. > > Verified the convergence using the following : > - SpecJbb2005 workload running on a 20VCPU/256G guest(~80% busy) > - OLTP like workload running on a 80VCPU/512G guest (~80% busy) > > Sample results with SpecJbb2005 workload : (migrate speed set to 20Gb and > migrate downtime set to 4seconds). > > (qemu) info migrate > capabilities: xbzrle: off auto-converge: off <---- > Migration status: active > total time: 1487503 milliseconds > expected downtime: 519 milliseconds > transferred ram: 383749347 kbytes > remaining ram: 2753372 kbytes > total ram: 268444224 kbytes > duplicate: 65461532 pages > skipped: 64901568 pages > normal: 95750218 pages > normal bytes: 383000872 kbytes > dirty pages rate: 67551 pages > > --- > > (qemu) info migrate > capabilities: xbzrle: off auto-converge: on <---- > Migration status: completed > total time: 241161 milliseconds > downtime: 6373 milliseconds > transferred ram: 28235307 kbytes > remaining ram: 0 kbytes > total ram: 268444224 kbytes > duplicate: 64946416 pages > skipped: 64903523 pages > normal: 7044971 pages > normal bytes: 28179884 kbytes Almost there, and certainly much better than the previous patches. Just a couple of comments. > Signed-off-by: Chegu Vinod > --- > arch_init.c | 68 +++++++++++++++++++++++++++++++++++++++++ > include/migration/migration.h | 4 ++ > migration.c | 1 + > 3 files changed, 73 insertions(+), 0 deletions(-) > > diff --git a/arch_init.c b/arch_init.c > index 49c5dc2..29788d6 100644 > --- a/arch_init.c > +++ b/arch_init.c > @@ -49,6 +49,7 @@ > #include "trace.h" > #include "exec/cpu-all.h" > #include "hw/acpi/acpi.h" > +#include "sysemu/cpus.h" > > #ifdef DEBUG_ARCH_INIT > #define DPRINTF(fmt, ...) \ > @@ -104,6 +105,8 @@ int graphic_depth = 15; > #endif > > const uint32_t arch_type = QEMU_ARCH; > +static bool mig_throttle_on; > + > > /***********************************************************/ > /* ram save/restore */ > @@ -378,8 +381,15 @@ static void migration_bitmap_sync(void) > uint64_t num_dirty_pages_init = migration_dirty_pages; > MigrationState *s = migrate_get_current(); > static int64_t start_time; > + static int64_t bytes_xfer_prev; > static int64_t num_dirty_pages_period; > int64_t end_time; > + int64_t bytes_xfer_now; > + static int dirty_rate_high_cnt; > + > + if (!bytes_xfer_prev) { > + bytes_xfer_prev = ram_bytes_transferred(); > + } > > if (!start_time) { > start_time = qemu_get_clock_ms(rt_clock); > @@ -404,6 +414,23 @@ static void migration_bitmap_sync(void) > > /* more than 1 second = 1000 millisecons */ > if (end_time > start_time + 1000) { > + if (migrate_auto_converge()) { > + /* The following detection logic can be refined later. For now: > + Check to see if the dirtied bytes is 50% more than the approx. > + amount of bytes that just got transferred since the last time we > + were in this routine. If that happens N times (for now N==5) > + we turn on the throttle down logic */ > + bytes_xfer_now = ram_bytes_transferred(); > + if (s->dirty_pages_rate && > + ((num_dirty_pages_period*TARGET_PAGE_SIZE) > > + ((bytes_xfer_now - bytes_xfer_prev)/2))) { > + if (dirty_rate_high_cnt++ > 5) { > + DPRINTF("Unable to converge. Throtting down guest\n"); > + mig_throttle_on = true; > + } > + } > + bytes_xfer_prev = bytes_xfer_now; > + } > s->dirty_pages_rate = num_dirty_pages_period * 1000 > / (end_time - start_time); > s->dirty_bytes_rate = s->dirty_pages_rate * TARGET_PAGE_SIZE; > @@ -496,6 +523,15 @@ static int ram_save_block(QEMUFile *f, bool last_stage) > return bytes_sent; > } > > +bool throttling_needed(void) > +{ > + if (!migrate_auto_converge()) { > + return false; > + } Also return false if !runstate_is_running() please. > + return mig_throttle_on; > +} > + > static uint64_t bytes_transferred; > > static ram_addr_t ram_save_remaining(void) > @@ -1098,3 +1134,35 @@ TargetInfo *qmp_query_target(Error **errp) > > return info; > } > + > +static void mig_delay_vcpu(void) > +{ > + qemu_mutex_unlock_iothread(); > + g_usleep(50*1000); > + qemu_mutex_lock_iothread(); > +} > + > +/* Stub used for getting the vcpu out of VM and into qemu via > + run_on_cpu()*/ > +static void mig_kick_cpu(void *opq) > +{ > + mig_delay_vcpu(); > + return; > +} Just inline mig_delay_vcpu in here, delete the extra return, and call this function mig_delay_vcpu > +/* To reduce the dirty rate explicitly disallow the VCPUs from spending > + much time in the VM. The migration thread will try to catchup. > + Workload will experience a performance drop. > +*/ > +void migration_throttle_down(void) > +{ > + if (throttling_needed()) { > + CPUArchState *penv = first_cpu; > + while (penv) { > + qemu_mutex_lock_iothread(); > + async_run_on_cpu(ENV_GET_CPU(penv), mig_kick_cpu, NULL); > + qemu_mutex_unlock_iothread(); Pleas hoist the lock/unlock outside the while loop. > + penv = penv->next_cpu; > + } > + } > +} > diff --git a/include/migration/migration.h b/include/migration/migration.h > index ace91b0..68b65c6 100644 > --- a/include/migration/migration.h > +++ b/include/migration/migration.h > @@ -129,4 +129,8 @@ int64_t migrate_xbzrle_cache_size(void); > int64_t xbzrle_cache_resize(int64_t new_size); > > bool migrate_auto_converge(void); > +bool throttling_needed(void); > +void stop_throttling(void); > +void migration_throttle_down(void); > + > #endif > diff --git a/migration.c b/migration.c > index 570cee5..d3673a6 100644 > --- a/migration.c > +++ b/migration.c > @@ -526,6 +526,7 @@ static void *migration_thread(void *opaque) > DPRINTF("pending size %lu max %lu\n", pending_size, max_size); > if (pending_size && pending_size >= max_size) { > qemu_savevm_state_iterate(s->file); > + migration_throttle_down(); Did you try the approach of calling migration_throttle_down from ram_save_iterate, based on how much time passed from the last occurrence? I would like that a bit more because in principle (especially with large bandwidth) qemu_savevm_state_iterate() can take a long time, thus the "duty cycle" of the auto-convergence is not predictable. Paolo > } else { > DPRINTF("done iterating\n"); > qemu_mutex_lock_iothread(); >