From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:35319)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jjherne@linux.vnet.ibm.com>) id 1Z0Czm-0000kP-Ac
	for qemu-devel@nongnu.org; Wed, 03 Jun 2015 14:02:31 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <jjherne@linux.vnet.ibm.com>) id 1Z0Czi-0003IX-8p
	for qemu-devel@nongnu.org; Wed, 03 Jun 2015 14:02:30 -0400
Received: from e19.ny.us.ibm.com ([129.33.205.209]:58347)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jjherne@linux.vnet.ibm.com>) id 1Z0Czi-0003II-3x
	for qemu-devel@nongnu.org; Wed, 03 Jun 2015 14:02:26 -0400
Received: from /spool/local
	by e19.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only!
	Violators will be prosecuted
	for <qemu-devel@nongnu.org> from <jjherne@linux.vnet.ibm.com>;
	Wed, 3 Jun 2015 14:02:24 -0400
Received: from b01cxnp22035.gho.pok.ibm.com (b01cxnp22035.gho.pok.ibm.com
	[9.57.198.25])
	by d01dlp01.pok.ibm.com (Postfix) with ESMTP id 0643738C8051
	for <qemu-devel@nongnu.org>; Wed,  3 Jun 2015 14:02:23 -0400 (EDT)
Received: from d01av05.pok.ibm.com (d01av05.pok.ibm.com [9.56.224.195])
	by b01cxnp22035.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id
	t53I2M7g59113584
	for <qemu-devel@nongnu.org>; Wed, 3 Jun 2015 18:02:22 GMT
Received: from d01av05.pok.ibm.com (localhost [127.0.0.1])
	by d01av05.pok.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id
	t53I2MIK020919
	for <qemu-devel@nongnu.org>; Wed, 3 Jun 2015 14:02:22 -0400
Message-ID: <556F412C.9090106@linux.vnet.ibm.com>
Date: Wed, 03 Jun 2015 14:02:20 -0400
From: "Jason J. Herne" <jjherne@linux.vnet.ibm.com>
MIME-Version: 1.0
References: <1433267209-9882-1-git-send-email-jjherne@linux.vnet.ibm.com>
	<1433267209-9882-2-git-send-email-jjherne@linux.vnet.ibm.com>
	<87pp5drzrl.fsf@neno.neno>
In-Reply-To: <87pp5drzrl.fsf@neno.neno>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH v2 1/3] cpu: Provide vcpu throttling
	interface
Reply-To: jjherne@linux.vnet.ibm.com
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: quintela@redhat.com
Cc: amit.shah@redhat.com, borntraeger@de.ibm.com, qemu-devel@nongnu.org, dgilbert@redhat.com, afaerber@suse.de

On 06/03/2015 03:56 AM, Juan Quintela wrote:
> "Jason J. Herne" <jjherne@linux.vnet.ibm.com> wrote:
>> Provide a method to throttle guest cpu execution. CPUState is augmented with
>> timeout controls and throttle start/stop functions. To throttle the guest cpu
>> the caller simply has to call the throttle start function and provide a ratio of
>> sleep time to normal execution time.
>>
>> Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
>> Reviewed-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
>
>
>
> Hi
>
> sorry that I replied to your previous submission, I had "half done" the
> mail from yesterday, I still think that all applies.
>
>
>> ---
>>   cpus.c            | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   include/qom/cpu.h | 46 +++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 108 insertions(+)
>>
>> diff --git a/cpus.c b/cpus.c
>> index de6469f..7568357 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -64,6 +64,9 @@
>>
>>   #endif /* CONFIG_LINUX */
>>
>> +/* Number of ms between cpu throttle operations */
>> +#define CPU_THROTTLE_TIMESLICE 10
>
>
> We are checking for throotling on each cpu each 10ms.
> But on patch 2 we can see that we only change the throotling each
> time that we call migration_bitmap_sync(), that only happens each round
> through all the pages. Normally auto-converge only matters for machines
> with lots of memory, so this is going to happen each more than 10ms (we
> change it each 4 passes).  You changed it to each 2 passes, and you add
> it a 0.2.  I think  that I would preffer to just have it each single
> pass, but add a 0.1 each pass?  simpler and end result would be the same?
>
>

Well, we certainly could make it run every pass but I think it would get
a little too aggressive then. The reason is, we do not increment the 
throttle
rate by adding 0.2 each time. We increment it by multiplying the current 
rate
by 2. So by doing that every pass we are doubling the exponential growth
rate. I will admit the numbers I chose are hardly scientific... I chose them
because they seemed to provide a decent balance of "throttling 
aggression" in
my workloads.

> This is what the old code did (new code do exactly the same, just in the
> previous patch)
>
> -static void mig_sleep_cpu(void *opq)
> -{
> -    qemu_mutex_unlock_iothread();
> -    g_usleep(30*1000);
> -    qemu_mutex_lock_iothread();
> -}
>
> So, what we are doing, calling async_run_on_cpu() with this function.
>
> To make things short, we end here:
>
> static void flush_queued_work(CPUState *cpu)
> {
>      struct qemu_work_item *wi;
>
>      if (cpu->queued_work_first == NULL) {
>          return;
>      }
>
>      while ((wi = cpu->queued_work_first)) {
>          cpu->queued_work_first = wi->next;
>          wi->func(wi->data);
>          wi->done = true;
>          if (wi->free) {
>              g_free(wi);
>          }
>      }
>      cpu->queued_work_last = NULL;
>      qemu_cond_broadcast(&qemu_work_cond);
> }
>
> So, we are walking a list that is protected with the iothread_lock, and
> we are dropping the lock in the middle of the walk .....  Just hoping
> that nothing else is calling run_async_on_cpu() from a different place
> ....
>
>
> Could we change this to something like:
>
> diff --git a/kvm-all.c b/kvm-all.c
> index 17a3771..ae9635f 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -1787,6 +1787,7 @@ int kvm_cpu_exec(CPUState *cpu)
>   {
>       struct kvm_run *run = cpu->kvm_run;
>       int ret, run_ret;
> +    long throotling_time;
>
>       DPRINTF("kvm_cpu_exec()\n");
>
> @@ -1813,8 +1814,15 @@ int kvm_cpu_exec(CPUState *cpu)
>                */
>               qemu_cpu_kick_self();
>           }
> +        throotling_time = cpu->throotling_time;
> +        cpu->throotling_time = 0;
> +        cpu->sleeping_time += throotling_time;
>           qemu_mutex_unlock_iothread();
>
> +
> +        if (throotling_time) {
> +            g_usleep(throttling_time);
> +        }
>           run_ret = kvm_vcpu_ioctl(cpu, KVM_RUN, 0);
>
>           qemu_mutex_lock_iothread();
>
>
>
> And we handle where we are doing now what throotling_time and
> sleeping_time should be?  No need to drop throotling_time/lock/whatever.
>
> It adds an if on the fast path, but we have a call to the hypervisor
> each time, so it shouldn't be so bad, no?
>
> For tcp/xen we should seach for a different place to put this code, but
> you get the idea.  Reason for putting it here is because this is where
> the iothread is dropped, so we don't lost any locking, any other place
> that we put it needs review with respect to this.
>
>
> Jason, I am really, really sorry to have open this big can of worms on
> your patch.  Now you know why I was telling that "improving"
> autoconverge was not as easy as it looked.
>
> With respect ot the last comment, I also think that we can include the
> Jason patches.  They are one imprevement over what we have now.  It is
> just that it needs more improvements.
>

Yes, I understand what you are suggesting here. I can certainly look into it
if you would prefer that rather than accept the current design. The reason I
did things using the timer and thread was because it fit into the old
auto-converge code very nicely. Does it make sense to go forward with my
current design (modified as per your earlier suggestions), and then follow
up with your proposal at a later date?

-- 
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)