From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:54552)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jjherne@linux.vnet.ibm.com>) id 1YzTKM-000129-Lk
	for qemu-devel@nongnu.org; Mon, 01 Jun 2015 13:16:43 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <jjherne@linux.vnet.ibm.com>) id 1YzTKH-0008CW-CB
	for qemu-devel@nongnu.org; Mon, 01 Jun 2015 13:16:42 -0400
Received: from e39.co.us.ibm.com ([32.97.110.160]:55076)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jjherne@linux.vnet.ibm.com>) id 1YzTKH-0008AY-0C
	for qemu-devel@nongnu.org; Mon, 01 Jun 2015 13:16:37 -0400
Received: from /spool/local
	by e39.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only!
	Violators will be prosecuted
	for <qemu-devel@nongnu.org> from <jjherne@linux.vnet.ibm.com>;
	Mon, 1 Jun 2015 11:16:36 -0600
Received: from b03cxnp08025.gho.boulder.ibm.com
	(b03cxnp08025.gho.boulder.ibm.com [9.17.130.17])
	by d03dlp02.boulder.ibm.com (Postfix) with ESMTP id BF4013E4003E
	for <qemu-devel@nongnu.org>; Mon,  1 Jun 2015 11:16:33 -0600 (MDT)
Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167])
	by b03cxnp08025.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with
	ESMTP id t51HG0j824051790
	for <qemu-devel@nongnu.org>; Mon, 1 Jun 2015 10:16:00 -0700
Received: from d03av01.boulder.ibm.com (localhost [127.0.0.1])
	by d03av01.boulder.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP
	id t51HGXfi003959
	for <qemu-devel@nongnu.org>; Mon, 1 Jun 2015 11:16:33 -0600
Message-ID: <556C936F.8090606@linux.vnet.ibm.com>
Date: Mon, 01 Jun 2015 13:16:31 -0400
From: "Jason J. Herne" <jjherne@linux.vnet.ibm.com>
MIME-Version: 1.0
References: <1433171851-18507-1-git-send-email-jjherne@linux.vnet.ibm.com>
	<1433171851-18507-3-git-send-email-jjherne@linux.vnet.ibm.com>
	<20150601153259.GK2314@work-vm>
In-Reply-To: <20150601153259.GK2314@work-vm>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH 2/2] migration: Dynamic cpu throttling for
 auto-converge
Reply-To: jjherne@linux.vnet.ibm.com
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: amit.shah@redhat.com, borntraeger@de.ibm.com, quintela@redhat.com, qemu-devel@nongnu.org, afaerber@suse.de

On 06/01/2015 11:32 AM, Dr. David Alan Gilbert wrote:
> * Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote:
>> Remove traditional auto-converge static 30ms throttling code and replace it
>> with a dynamic throttling algorithm.
>>
>> Additionally, be more aggressive when deciding when to start throttling.
>> Previously we waited until four unproductive memory passes. Now we begin
>> throttling after only two unproductive memory passes. Four seemed quite
>> arbitrary and only waiting for two passes allows us to complete the migration
>> faster.
>>
>> Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
>> Reviewed-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
>> ---
>>   arch_init.c           | 95 +++++++++++++++++----------------------------------
>>   migration/migration.c |  9 +++++
>>   2 files changed, 41 insertions(+), 63 deletions(-)
>>
>> diff --git a/arch_init.c b/arch_init.c
>> index 23d3feb..73ae494 100644
>> --- a/arch_init.c
>> +++ b/arch_init.c
>> @@ -111,9 +111,7 @@ int graphic_depth = 32;
>>   #endif
>>
>>   const uint32_t arch_type = QEMU_ARCH;
>> -static bool mig_throttle_on;
>>   static int dirty_rate_high_cnt;
>> -static void check_guest_throttling(void);
>>
>>   static uint64_t bitmap_sync_count;
>>
>> @@ -487,6 +485,31 @@ static size_t save_page_header(QEMUFile *f, RAMBlock *block, ram_addr_t offset)
>>       return size;
>>   }
>>
>> +/* Reduce amount of guest cpu execution to hopefully slow down memory writes.
>> + * If guest dirty memory rate is reduced below the rate at which we can
>> + * transfer pages to the destination then we should be able to complete
>> + * migration. Some workloads dirty memory way too fast and will not effectively
>> + * converge, even with auto-converge. For these workloads we will continue to
>> + * increase throttling until the guest is paused long enough to complete the
>> + * migration. This essentially becomes a non-live migration.
>> + */
>> +static void mig_throttle_guest_down(void)
>> +{
>> +    CPUState *cpu;
>> +
>> +    CPU_FOREACH(cpu) {
>> +        /* We have not started throttling yet. Lets start it.*/
>> +        if (!cpu_throttle_active(cpu)) {
>> +            cpu_throttle_start(cpu, 0.2);
>> +        }
>> +
>> +        /* Throttling is already in place. Just increase the throttling rate */
>> +        else {
>> +            cpu_throttle_start(cpu, cpu_throttle_get_ratio(cpu) * 2);
>> +        }
>
> Now that migration has migrate_parameters, it would be best to replace
> the magic numbers (the 0.2, the *2 - anything else?)  by parameters that can
> change the starting throttling and increase rate.  It would probably also be
> good to make the current throttling rate visible in info somewhere; maybe
> info migrate?
>

I did consider all of this. However, I don't think that the controls
this patch provides are an ideal throttling mechanism. I suspect someone 
with
vcpu/scheduling experience could whip up something more user friendly 
and cleaner.
I merely propose this because it seems better than what we have today for
auto-converge.

Also, I'm not sure how useful the information really is to the user. The 
fact that it
is a ratio instead of a percentage might be confusing. Also, I suspect 
it is not
truly very accurate. Again, I was going for "make it better", not "make 
it perfect".

Lastly, if we define this external interface we are kind of stuck with 
it, yes? In
this regard we should be sure that this is how we want cpu throttling to 
work.
Alternatively, I propose to accept this patch set as-is and then work on 
a real
vcpu Throttling mechanism that can be used for auto-converge as well as 
a user
controllable guest throttling/limiting mechanism. Once that is in place 
we can
migrate (no pun intended) the auto-converge code to the new way and 
remove this
stuff.

With all of that said, I'm willing to provide the requested controls if 
we really
feel the pros outweigh the cons. Thanks for your review :).

...

-- 
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)