From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53383) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yzmic-0003QI-L8 for qemu-devel@nongnu.org; Tue, 02 Jun 2015 09:59:07 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Yzmib-00085r-1q for qemu-devel@nongnu.org; Tue, 02 Jun 2015 09:59:02 -0400 Received: from mx1.redhat.com ([209.132.183.28]:37078) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yzmia-00085j-QP for qemu-devel@nongnu.org; Tue, 02 Jun 2015 09:59:00 -0400 Date: Tue, 2 Jun 2015 14:58:55 +0100 From: "Dr. David Alan Gilbert" Message-ID: <20150602135855.GC2139@work-vm> References: <1433171851-18507-1-git-send-email-jjherne@linux.vnet.ibm.com> <1433171851-18507-3-git-send-email-jjherne@linux.vnet.ibm.com> <20150601153259.GK2314@work-vm> <556C936F.8090606@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <556C936F.8090606@linux.vnet.ibm.com> Subject: Re: [Qemu-devel] [PATCH 2/2] migration: Dynamic cpu throttling for auto-converge List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Jason J. Herne" Cc: amit.shah@redhat.com, borntraeger@de.ibm.com, quintela@redhat.com, qemu-devel@nongnu.org, afaerber@suse.de * Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote: > On 06/01/2015 11:32 AM, Dr. David Alan Gilbert wrote: > >* Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote: > >>Remove traditional auto-converge static 30ms throttling code and replace it > >>with a dynamic throttling algorithm. > >> > >>Additionally, be more aggressive when deciding when to start throttling. > >>Previously we waited until four unproductive memory passes. Now we begin > >>throttling after only two unproductive memory passes. Four seemed quite > >>arbitrary and only waiting for two passes allows us to complete the migration > >>faster. > >> > >>Signed-off-by: Jason J. Herne > >>Reviewed-by: Matthew Rosato > >>--- > >> arch_init.c | 95 +++++++++++++++++---------------------------------- > >> migration/migration.c | 9 +++++ > >> 2 files changed, 41 insertions(+), 63 deletions(-) > >> > >>diff --git a/arch_init.c b/arch_init.c > >>index 23d3feb..73ae494 100644 > >>--- a/arch_init.c > >>+++ b/arch_init.c > >>@@ -111,9 +111,7 @@ int graphic_depth = 32; > >> #endif > >> > >> const uint32_t arch_type = QEMU_ARCH; > >>-static bool mig_throttle_on; > >> static int dirty_rate_high_cnt; > >>-static void check_guest_throttling(void); > >> > >> static uint64_t bitmap_sync_count; > >> > >>@@ -487,6 +485,31 @@ static size_t save_page_header(QEMUFile *f, RAMBlock *block, ram_addr_t offset) > >> return size; > >> } > >> > >>+/* Reduce amount of guest cpu execution to hopefully slow down memory writes. > >>+ * If guest dirty memory rate is reduced below the rate at which we can > >>+ * transfer pages to the destination then we should be able to complete > >>+ * migration. Some workloads dirty memory way too fast and will not effectively > >>+ * converge, even with auto-converge. For these workloads we will continue to > >>+ * increase throttling until the guest is paused long enough to complete the > >>+ * migration. This essentially becomes a non-live migration. > >>+ */ > >>+static void mig_throttle_guest_down(void) > >>+{ > >>+ CPUState *cpu; > >>+ > >>+ CPU_FOREACH(cpu) { > >>+ /* We have not started throttling yet. Lets start it.*/ > >>+ if (!cpu_throttle_active(cpu)) { > >>+ cpu_throttle_start(cpu, 0.2); > >>+ } > >>+ > >>+ /* Throttling is already in place. Just increase the throttling rate */ > >>+ else { > >>+ cpu_throttle_start(cpu, cpu_throttle_get_ratio(cpu) * 2); > >>+ } > > > >Now that migration has migrate_parameters, it would be best to replace > >the magic numbers (the 0.2, the *2 - anything else?) by parameters that can > >change the starting throttling and increase rate. It would probably also be > >good to make the current throttling rate visible in info somewhere; maybe > >info migrate? > > > > I did consider all of this. However, I don't think that the controls > this patch provides are an ideal throttling mechanism. I suspect someone > with > vcpu/scheduling experience could whip up something more user friendly and > cleaner. > I merely propose this because it seems better than what we have today for > auto-converge. > > Also, I'm not sure how useful the information really is to the user. The > fact that it is a ratio instead of a percentage might be confusing. Also, > I suspect it is not > truly very accurate. Again, I was going for "make it better", not "make it > perfect". > > Lastly, if we define this external interface we are kind of stuck with it, > yes? Well, one thing you can do is add a parameter with a name starting with x- which means it's not a fixed interface (so things like libvirt wont use it). And the reason I was interested in seeing the information was otherwise we don't really have any way of knowing how well the code is working; is it already throttling down more and more? > In > this regard we should be sure that this is how we want cpu throttling to > work. Alternatively, I propose to accept this patch set as-is and then work on a > real vcpu Throttling mechanism that can be used for auto-converge as well as a > user controllable guest throttling/limiting mechanism. Once that is in place we > can migrate (no pun intended) the auto-converge code to the new way and remove > this stuff. Yes, it's probably still better than what we already have. Dave > > With all of that said, I'm willing to provide the requested controls if we > really > feel the pros outweigh the cons. Thanks for your review :). > > ... > > -- > -- Jason J. Herne (jjherne@linux.vnet.ibm.com) > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK