From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:54561) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UqwD2-0004Ma-Ua for qemu-devel@nongnu.org; Sun, 23 Jun 2013 22:08:50 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UqwD1-0005TL-FW for qemu-devel@nongnu.org; Sun, 23 Jun 2013 22:08:48 -0400 Received: from g6t0184.atlanta.hp.com ([15.193.32.61]:5482) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UqwD1-0005Sv-9f for qemu-devel@nongnu.org; Sun, 23 Jun 2013 22:08:47 -0400 Message-ID: <51C7AA27.5080306@hp.com> Date: Sun, 23 Jun 2013 19:08:39 -0700 From: Chegu Vinod MIME-Version: 1.0 References: <1371218337-51891-1-git-send-email-chegu_vinod@hp.com> <1371218337-51891-4-git-send-email-chegu_vinod@hp.com> <51C2FB6B.7090401@redhat.com> In-Reply-To: <51C2FB6B.7090401@redhat.com> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC PATCH v6 3/3] Force auto-convegence of live migration List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: owasserm@redhat.com, qemu-devel@nongnu.org, anthony@codemonkey.ws, quintela@redhat.com On 6/20/2013 5:54 AM, Paolo Bonzini wrote: > Il 14/06/2013 15:58, Chegu Vinod ha scritto: >> If a user chooses to turn on the auto-converge migration capability >> these changes detect the lack of convergence and throttle down the >> guest. i.e. force the VCPUs out of the guest for some duration >> and let the migration thread catchup and help converge. > Hi Vinod, > > pretty much the same comments I sent you yesterday on the obsolete > version of the patch still apply. > >> Verified the convergence using the following : >> - Java Warehouse workload running on a 20VCPU/256G guest(~80% busy) >> - OLTP like workload running on a 80VCPU/512G guest (~80% busy) >> >> Sample results with Java warehouse workload : (migrate speed set to 20Gb and >> migrate downtime set to 4seconds). >> >> (qemu) info migrate >> capabilities: xbzrle: off auto-converge: off <---- >> Migration status: active >> total time: 1487503 milliseconds >> expected downtime: 519 milliseconds >> transferred ram: 383749347 kbytes >> remaining ram: 2753372 kbytes >> total ram: 268444224 kbytes >> duplicate: 65461532 pages >> skipped: 64901568 pages >> normal: 95750218 pages >> normal bytes: 383000872 kbytes >> dirty pages rate: 67551 pages >> >> --- >> >> (qemu) info migrate >> capabilities: xbzrle: off auto-converge: on <---- >> Migration status: completed >> total time: 241161 milliseconds >> downtime: 6373 milliseconds >> transferred ram: 28235307 kbytes >> remaining ram: 0 kbytes >> total ram: 268444224 kbytes >> duplicate: 64946416 pages >> skipped: 64903523 pages >> normal: 7044971 pages >> normal bytes: 28179884 kbytes >> >> Signed-off-by: Chegu Vinod >> --- >> arch_init.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> 1 files changed, 85 insertions(+), 0 deletions(-) >> >> diff --git a/arch_init.c b/arch_init.c >> index 5d32ecf..69c6c8c 100644 >> --- a/arch_init.c >> +++ b/arch_init.c >> @@ -104,6 +104,8 @@ int graphic_depth = 15; >> #endif >> >> const uint32_t arch_type = QEMU_ARCH; >> +static bool mig_throttle_on; >> +static void throttle_down_guest_to_converge(void); >> >> /***********************************************************/ >> /* ram save/restore */ >> @@ -378,8 +380,15 @@ static void migration_bitmap_sync(void) >> uint64_t num_dirty_pages_init = migration_dirty_pages; >> MigrationState *s = migrate_get_current(); >> static int64_t start_time; >> + static int64_t bytes_xfer_prev; >> static int64_t num_dirty_pages_period; >> int64_t end_time; >> + int64_t bytes_xfer_now; >> + static int dirty_rate_high_cnt; >> + >> + if (!bytes_xfer_prev) { >> + bytes_xfer_prev = ram_bytes_transferred(); >> + } >> >> if (!start_time) { >> start_time = qemu_get_clock_ms(rt_clock); >> @@ -404,6 +413,23 @@ static void migration_bitmap_sync(void) >> >> /* more than 1 second = 1000 millisecons */ >> if (end_time > start_time + 1000) { >> + if (migrate_auto_converge()) { >> + /* The following detection logic can be refined later. For now: >> + Check to see if the dirtied bytes is 50% more than the approx. >> + amount of bytes that just got transferred since the last time we >> + were in this routine. If that happens >N times (for now N==4) >> + we turn on the throttle down logic */ >> + bytes_xfer_now = ram_bytes_transferred(); >> + if (s->dirty_pages_rate && >> + ((num_dirty_pages_period*TARGET_PAGE_SIZE) > >> + ((bytes_xfer_now - bytes_xfer_prev)/2))) { >> + if (dirty_rate_high_cnt++ > 4) { > Too many parentheses, and please remove the nested if. > >> + DPRINTF("Unable to converge. Throtting down guest\n"); > Please use tracepoint instead. > >> + mig_throttle_on = true; > Need to reset dirty_rate_high_cnt here, and both > dirty_rate_high_cnt/mig_throttle_on if you see !migrate_auto_converge(). > This ensures that throttling does not kick in automatically if you > disable and re-enable the feature. It also lets you remove a bunch of > migrate_auto_converge() checks. > > You also need to reset dirty_rate_high_cnt/mig_throttle_on in the setup > phase of migration. > >> + } >> + } >> + bytes_xfer_prev = bytes_xfer_now; >> + } >> s->dirty_pages_rate = num_dirty_pages_period * 1000 >> / (end_time - start_time); >> s->dirty_bytes_rate = s->dirty_pages_rate * TARGET_PAGE_SIZE; >> @@ -628,6 +654,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) >> } >> total_sent += bytes_sent; >> acct_info.iterations++; >> + throttle_down_guest_to_converge(); > You can use a shorter name, like check_cpu_throttling(). > >> /* we want to check in the 1st loop, just in case it was the 1st time >> and we had to sync the dirty bitmap. >> qemu_get_clock_ns() is a bit expensive, so we only check each some >> @@ -1098,3 +1125,61 @@ TargetInfo *qmp_query_target(Error **errp) >> >> return info; >> } >> + >> +static bool throttling_needed(void) >> +{ >> + if (!migrate_auto_converge()) { >> + return false; >> + } >> + return mig_throttle_on; >> +} >> + >> +/* Stub function that's gets run on the vcpu when its brought out of the >> + VM to run inside qemu via async_run_on_cpu()*/ >> +static void mig_sleep_cpu(void *opq) >> +{ >> + qemu_mutex_unlock_iothread(); >> + g_usleep(30*1000); >> + qemu_mutex_lock_iothread(); > Letting the user specify the entity of the pause would be nice, so that > management can ramp it up. It can be done as a follow-up by adding a > '*value': 'int' field to MigrationCapabilityStatus (between 0 and 100, > default 30 as above). Thanks Paolo. With the exception of the above which can be pursued as a follow-up... I have incorporated all your suggested changes and re-tested and sending out a v7 series (without RFC). Thanks! Vinod > Paolo > >> +} >> + >> +/* To reduce the dirty rate explicitly disallow the VCPUs from spending >> + much time in the VM. The migration thread will try to catchup. >> + Workload will experience a performance drop. >> +*/ >> +static void mig_throttle_cpu_down(CPUState *cpu, void *data) >> +{ >> + async_run_on_cpu(cpu, mig_sleep_cpu, NULL); >> +} >> + >> +static void mig_throttle_guest_down(void) >> +{ >> + if (throttling_needed()) { > No need for this "if", it is done already in the caller. > >> + qemu_mutex_lock_iothread(); >> + qemu_for_each_cpu(mig_throttle_cpu_down, NULL); >> + qemu_mutex_unlock_iothread(); >> + } >> +} >> + >> +static void throttle_down_guest_to_converge(void) >> +{ >> + static int64_t t0; >> + int64_t t1; >> + >> + if (!throttling_needed()) { > With the above suggested changes, this can simply check mig_throttle_on. > >> + return; >> + } >> + >> + if (!t0) { >> + t0 = qemu_get_clock_ns(rt_clock); >> + return; >> + } >> + >> + t1 = qemu_get_clock_ns(rt_clock); >> + >> + /* If it has been more than 40 ms since the last time the guest >> + * was throtled then do it again. >> + */ > throttled > >> + if (((t1-t0)/1000000) > 40) { > I prefer moving the multiplication to the right so you don't need > parentheses, but this is _really_ a nit... > > Paolo > >> + mig_throttle_guest_down(); >> + t0 = t1; >> + } >> +} >> > . >