From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34154) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YznJy-0005gW-TQ for qemu-devel@nongnu.org; Tue, 02 Jun 2015 10:37:40 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YznJt-00072C-Il for qemu-devel@nongnu.org; Tue, 02 Jun 2015 10:37:38 -0400 Received: from e31.co.us.ibm.com ([32.97.110.149]:49759) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YznJt-000728-AN for qemu-devel@nongnu.org; Tue, 02 Jun 2015 10:37:33 -0400 Received: from /spool/local by e31.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 2 Jun 2015 08:37:31 -0600 Received: from b03cxnp08028.gho.boulder.ibm.com (b03cxnp08028.gho.boulder.ibm.com [9.17.130.20]) by d03dlp03.boulder.ibm.com (Postfix) with ESMTP id 7F88219D804C for ; Tue, 2 Jun 2015 08:28:30 -0600 (MDT) Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by b03cxnp08028.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id t52EavjK29491264 for ; Tue, 2 Jun 2015 07:36:57 -0700 Received: from d03av02.boulder.ibm.com (localhost [127.0.0.1]) by d03av02.boulder.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id t52EbPkH015727 for ; Tue, 2 Jun 2015 08:37:27 -0600 Message-ID: <556DBF9F.9000004@linux.vnet.ibm.com> Date: Tue, 02 Jun 2015 10:37:19 -0400 From: "Jason J. Herne" MIME-Version: 1.0 References: <1433171851-18507-1-git-send-email-jjherne@linux.vnet.ibm.com> <1433171851-18507-3-git-send-email-jjherne@linux.vnet.ibm.com> <20150601153259.GK2314@work-vm> <556C936F.8090606@linux.vnet.ibm.com> <20150602135855.GC2139@work-vm> In-Reply-To: <20150602135855.GC2139@work-vm> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 2/2] migration: Dynamic cpu throttling for auto-converge Reply-To: jjherne@linux.vnet.ibm.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" Cc: amit.shah@redhat.com, borntraeger@de.ibm.com, quintela@redhat.com, qemu-devel@nongnu.org, afaerber@suse.de On 06/02/2015 09:58 AM, Dr. David Alan Gilbert wrote: > * Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote: >> On 06/01/2015 11:32 AM, Dr. David Alan Gilbert wrote: >>> * Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote: >>>> Remove traditional auto-converge static 30ms throttling code and replace it >>>> with a dynamic throttling algorithm. >>>> >>>> Additionally, be more aggressive when deciding when to start throttling. >>>> Previously we waited until four unproductive memory passes. Now we begin >>>> throttling after only two unproductive memory passes. Four seemed quite >>>> arbitrary and only waiting for two passes allows us to complete the migration >>>> faster. >>>> >>>> Signed-off-by: Jason J. Herne >>>> Reviewed-by: Matthew Rosato >>>> --- >>>> arch_init.c | 95 +++++++++++++++++---------------------------------- >>>> migration/migration.c | 9 +++++ >>>> 2 files changed, 41 insertions(+), 63 deletions(-) >>>> >>>> diff --git a/arch_init.c b/arch_init.c >>>> index 23d3feb..73ae494 100644 >>>> --- a/arch_init.c >>>> +++ b/arch_init.c >>>> @@ -111,9 +111,7 @@ int graphic_depth = 32; >>>> #endif >>>> >>>> const uint32_t arch_type = QEMU_ARCH; >>>> -static bool mig_throttle_on; >>>> static int dirty_rate_high_cnt; >>>> -static void check_guest_throttling(void); >>>> >>>> static uint64_t bitmap_sync_count; >>>> >>>> @@ -487,6 +485,31 @@ static size_t save_page_header(QEMUFile *f, RAMBlock *block, ram_addr_t offset) >>>> return size; >>>> } >>>> >>>> +/* Reduce amount of guest cpu execution to hopefully slow down memory writes. >>>> + * If guest dirty memory rate is reduced below the rate at which we can >>>> + * transfer pages to the destination then we should be able to complete >>>> + * migration. Some workloads dirty memory way too fast and will not effectively >>>> + * converge, even with auto-converge. For these workloads we will continue to >>>> + * increase throttling until the guest is paused long enough to complete the >>>> + * migration. This essentially becomes a non-live migration. >>>> + */ >>>> +static void mig_throttle_guest_down(void) >>>> +{ >>>> + CPUState *cpu; >>>> + >>>> + CPU_FOREACH(cpu) { >>>> + /* We have not started throttling yet. Lets start it.*/ >>>> + if (!cpu_throttle_active(cpu)) { >>>> + cpu_throttle_start(cpu, 0.2); >>>> + } >>>> + >>>> + /* Throttling is already in place. Just increase the throttling rate */ >>>> + else { >>>> + cpu_throttle_start(cpu, cpu_throttle_get_ratio(cpu) * 2); >>>> + } >>> >>> Now that migration has migrate_parameters, it would be best to replace >>> the magic numbers (the 0.2, the *2 - anything else?) by parameters that can >>> change the starting throttling and increase rate. It would probably also be >>> good to make the current throttling rate visible in info somewhere; maybe >>> info migrate? >>> >> >> I did consider all of this. However, I don't think that the controls >> this patch provides are an ideal throttling mechanism. I suspect someone >> with >> vcpu/scheduling experience could whip up something more user friendly and >> cleaner. >> I merely propose this because it seems better than what we have today for >> auto-converge. >> >> Also, I'm not sure how useful the information really is to the user. The >> fact that it is a ratio instead of a percentage might be confusing. Also, >> I suspect it is not >> truly very accurate. Again, I was going for "make it better", not "make it >> perfect". >> >> Lastly, if we define this external interface we are kind of stuck with it, >> yes? > > Well, one thing you can do is add a parameter with a name starting with x- > which means it's not a fixed interface (so things like libvirt wont use it). > And the reason I was interested in seeing the information was otherwise > we don't really have any way of knowing how well the code is working; > is it already throttling down more and more? > Fair point. Can we add a qmp/hmp command like "info x-cpu-throttle"? Or a new line in "info migrate" output, "x-throttle-ration: 1.0" perhaps? Would this mess up libvirt's parsing of the hmp/qmp data? -- -- Jason J. Herne (jjherne@linux.vnet.ibm.com)