From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:32874) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ur9Wr-0005oZ-Mj for qemu-devel@nongnu.org; Mon, 24 Jun 2013 12:22:13 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Ur9Wp-0006iY-4w for qemu-devel@nongnu.org; Mon, 24 Jun 2013 12:22:09 -0400 Received: from g1t0028.austin.hp.com ([15.216.28.35]:47780) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ur9Wo-0006hK-Rf for qemu-devel@nongnu.org; Mon, 24 Jun 2013 12:22:06 -0400 Message-ID: <51C8722C.10909@hp.com> Date: Mon, 24 Jun 2013 09:22:04 -0700 From: Chegu Vinod MIME-Version: 1.0 References: <1372067259-141032-1-git-send-email-chegu_vinod@hp.com> <51C86CD8.8080203@redhat.com> In-Reply-To: <51C86CD8.8080203@redhat.com> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v8 3/3] Force auto-convegence of live migration List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: owasserm@redhat.com, qemu-devel@nongnu.org, anthony@codemonkey.ws, quintela@redhat.com On 6/24/2013 8:59 AM, Paolo Bonzini wrote: > Il 24/06/2013 11:47, Chegu Vinod ha scritto: >> If a user chooses to turn on the auto-converge migration capability >> these changes detect the lack of convergence and throttle down the >> guest. i.e. force the VCPUs out of the guest for some duration >> and let the migration thread catchup and help converge. >> >> Verified the convergence using the following : >> - Java Warehouse workload running on a 20VCPU/256G guest(~80% busy) >> - OLTP like workload running on a 80VCPU/512G guest (~80% busy) >> >> Sample results with Java warehouse workload : (migrate speed set to 20Gb and >> migrate downtime set to 4seconds). >> >> (qemu) info migrate >> capabilities: xbzrle: off auto-converge: off <---- >> Migration status: active >> total time: 1487503 milliseconds >> expected downtime: 519 milliseconds >> transferred ram: 383749347 kbytes >> remaining ram: 2753372 kbytes >> total ram: 268444224 kbytes >> duplicate: 65461532 pages >> skipped: 64901568 pages >> normal: 95750218 pages >> normal bytes: 383000872 kbytes >> dirty pages rate: 67551 pages >> >> --- >> >> (qemu) info migrate >> capabilities: xbzrle: off auto-converge: on <---- >> Migration status: completed >> total time: 241161 milliseconds >> downtime: 6373 milliseconds >> transferred ram: 28235307 kbytes >> remaining ram: 0 kbytes >> total ram: 268444224 kbytes >> duplicate: 64946416 pages >> skipped: 64903523 pages >> normal: 7044971 pages >> normal bytes: 28179884 kbytes >> >> Signed-off-by: Chegu Vinod > As far as the algorithm is concerned, > > Reviewed-by: Paolo Bonzini Thanks! > > but are you sure that this passes checkpatch.pl? Yes it does (had checked it before I posted). # ./scripts/checkpatch.pl 0003-Force-auto-convegence-of-live-migration.patch total: 0 errors, 0 warnings, 114 lines checked 0003-Force-auto-convegence-of-live-migration.patch has no obvious style problems and is ready for submission. Vinod > >> + /* The following detection logic can be refined later. For now: >> + Check to see if the dirtied bytes is 50% more than the approx. >> + amount of bytes that just got transferred since the last time we >> + were in this routine. If that happens >N times (for now N==4) >> + we turn on the throttle down logic */ >> + bytes_xfer_now = ram_bytes_transferred(); >> + if (s->dirty_pages_rate && >> + (num_dirty_pages_period * TARGET_PAGE_SIZE > >> + (bytes_xfer_now - bytes_xfer_prev)/2) && >> + (dirty_rate_high_cnt++ > 4)) { > the spacing of the operators here looks like something checkpatch.pl > would complain about. If you have to respin for that, keep my R-b and > please also remove all other superfluous parentheses. > > Paolo > >> + trace_migration_throttle(); >> + mig_throttle_on = true; >> + dirty_rate_high_cnt = 0; >> + } >> + bytes_xfer_prev = bytes_xfer_now; >> + } else { >> + mig_throttle_on = false; >> + } >> s->dirty_pages_rate = num_dirty_pages_period * 1000 >> / (end_time - start_time); >> s->dirty_bytes_rate = s->dirty_pages_rate * TARGET_PAGE_SIZE; >> @@ -566,6 +592,8 @@ static int ram_save_setup(QEMUFile *f, void *opaque) >> migration_bitmap = bitmap_new(ram_pages); >> bitmap_set(migration_bitmap, 0, ram_pages); >> migration_dirty_pages = ram_pages; >> + mig_throttle_on = false; >> + dirty_rate_high_cnt = 0; >> >> if (migrate_use_xbzrle()) { >> XBZRLE.cache = cache_init(migrate_xbzrle_cache_size() / >> @@ -628,6 +656,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) >> } >> total_sent += bytes_sent; >> acct_info.iterations++; >> + check_guest_throttling(); >> /* we want to check in the 1st loop, just in case it was the 1st time >> and we had to sync the dirty bitmap. >> qemu_get_clock_ns() is a bit expensive, so we only check each some >> @@ -1097,3 +1126,53 @@ TargetInfo *qmp_query_target(Error **errp) >> >> return info; >> } >> + >> +/* Stub function that's gets run on the vcpu when its brought out of the >> + VM to run inside qemu via async_run_on_cpu()*/ >> +static void mig_sleep_cpu(void *opq) >> +{ >> + qemu_mutex_unlock_iothread(); >> + g_usleep(30*1000); >> + qemu_mutex_lock_iothread(); >> +} >> + >> +/* To reduce the dirty rate explicitly disallow the VCPUs from spending >> + much time in the VM. The migration thread will try to catchup. >> + Workload will experience a performance drop. >> +*/ >> +static void mig_throttle_cpu_down(CPUState *cpu, void *data) >> +{ >> + async_run_on_cpu(cpu, mig_sleep_cpu, NULL); >> +} >> + >> +static void mig_throttle_guest_down(void) >> +{ >> + qemu_mutex_lock_iothread(); >> + qemu_for_each_cpu(mig_throttle_cpu_down, NULL); >> + qemu_mutex_unlock_iothread(); >> +} >> + >> +static void check_guest_throttling(void) >> +{ >> + static int64_t t0; >> + int64_t t1; >> + >> + if (!mig_throttle_on) { >> + return; >> + } >> + >> + if (!t0) { >> + t0 = qemu_get_clock_ns(rt_clock); >> + return; >> + } >> + >> + t1 = qemu_get_clock_ns(rt_clock); >> + >> + /* If it has been more than 40 ms since the last time the guest >> + * was throttled then do it again. >> + */ >> + if (40 < (t1-t0)/1000000) { >> + mig_throttle_guest_down(); >> + t0 = t1; >> + } >> +} >> > . >