qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Chegu Vinod <chegu_vinod@hp.com>
To: quintela@redhat.com
Cc: pbonzini@redhat.com, qemu-devel@nongnu.org,
	anthony@codemonkey.ws, owasserm@redhat.com
Subject: Re: [Qemu-devel] [RFC PATCH v2] Throttle-down guest when live migration does not converge.
Date: Tue, 30 Apr 2013 08:55:16 -0700	[thread overview]
Message-ID: <517FE964.2050702@hp.com> (raw)
In-Reply-To: <877gjkm4ds.fsf@elfo.elfo>

On 4/30/2013 8:20 AM, Juan Quintela wrote:
> Chegu Vinod <chegu_vinod@hp.com> wrote:
>> Busy enterprise workloads hosted on large sized VM's tend to dirty
>> memory faster than the transfer rate achieved via live guest migration.
>> Despite some good recent improvements (& using dedicated 10Gig NICs
>> between hosts) the live migration does NOT converge.
>>
>> A few options that were discussed/being-pursued to help with
>> the convergence issue include:
>>
>> 1) Slow down guest considerably via cgroup's CPU controls - requires
>>     libvirt client support to detect & trigger action, but conceptually
>>     similar to this RFC change.
>>
>> 2) Speed up transfer rate:
>>     - RDMA based Pre-copy - lower overhead and fast (Unfortunately
>>       has a few restrictions and some customers still choose not
>>       to deploy RDMA :-( ).
>>     - Add parallelism to improve transfer rate and use multiple 10Gig
>>       connections (bonded). - could add some overhead on the host.
>>
>> 3) Post-copy (preferably with RDMA) or a Pre+Post copy hybrid - Sounds
>>     promising but need to consider & handle newer failure scenarios.
>>
>> If an enterprise user chooses to force convergence of their migration
>> via the new capability "auto-converge" then with this change we auto-detect
>> lack of convergence scenario and trigger a slow down of the workload
>> by explicitly disallowing the VCPUs from spending much time in the VM
>> context.
>>
>> The migration thread tries to catchup and this eventually leads
>> to convergence in some "deterministic" amount of time. Yes it does
>> impact the performance of all the VCPUs but in my observation that
>> lasts only for a short duration of time. i.e. we end up entering
>> stage 3 (downtime phase) soon after that.
>>
>> No exernal trigger is required (unlike option 1) and it can co-exist
>> with enhancements being pursued as part of Option 2 (e.g. RDMA).
>>
>> Thanks to Juan and Paolo for their useful suggestions.
>>
>> Verified the convergence using the following :
>> - SpecJbb2005 workload running on a 20VCPU/256G guest(~80% busy)
>> - OLTP like workload running on a 80VCPU/512G guest (~80% busy)
>>
>> Sample results with SpecJbb2005 workload : (migrate speed set to 20Gb and
>> migrate downtime set to 4seconds).
>>
>> (qemu) info migrate
>> capabilities: xbzrle: off auto-converge: off  <----
>> Migration status: active
>> total time: 1487503 milliseconds
> 148 seconds

1487 seconds and still the Migration is not completed.

>
>> expected downtime: 519 milliseconds
>> transferred ram: 383749347 kbytes
>> remaining ram: 2753372 kbytes
>> total ram: 268444224 kbytes
>> duplicate: 65461532 pages
>> skipped: 64901568 pages
>> normal: 95750218 pages
>> normal bytes: 383000872 kbytes
>> dirty pages rate: 67551 pages
>>
>> ---
>>
>> (qemu) info migrate
>> capabilities: xbzrle: off auto-converge: on   <----
>> Migration status: completed
>> total time: 241161 milliseconds
>> downtime: 6373 milliseconds
> 6.3 seconds and finished,  not bad at all O:-)
That's the *downtime*..  The total time for migration to complete is  
241 secs. (SpecJBB is
one of those workloads that dirties memory quite a bit).

>
> How much does the guest throughput drops while we enter autoconverge mode?

Workload performance drops for some short duration and it...but it soon 
switches to stage 3.

>
>> transferred ram: 28235307 kbytes
>> remaining ram: 0 kbytes
>> total ram: 268444224 kbytes
>> duplicate: 64946416 pages
>> skipped: 64903523 pages
>> normal: 7044971 pages
>> normal bytes: 28179884 kbytes
>>
>> Changes from v1:
>> - rebased to latest qemu.git
>> - added auto-converge capability(default off) - suggested by Anthony Liguori &
>>                                                  Eric Blake.
>>
>> Signed-off-by: Chegu Vinod <chegu_vinod@hp.com>
>> @@ -379,12 +380,20 @@ static void migration_bitmap_sync(void)
>>       MigrationState *s = migrate_get_current();
>>       static int64_t start_time;
>>       static int64_t num_dirty_pages_period;
>> +    static int64_t bytes_xfer_prev;
>>       int64_t end_time;
>> +    int64_t bytes_xfer_now;
>> +    static int dirty_rate_high_cnt;
>> +
>> +    if (migrate_auto_converge() && !bytes_xfer_prev) {
> Just do the !bytes_xfer_prev test here?  migrate_autoconverge is more
> expensive to call that just do the assignment?

Sure
>
>> +
>> +    if (value) {
>> +        return true;
>> +    }
>> +    return false;
> this code is just:
>
> return value;

ok

>
>> diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
>> index 6f0200a..9a3886d 100644
>> --- a/include/qemu/main-loop.h
>> +++ b/include/qemu/main-loop.h
>> @@ -299,6 +299,9 @@ void qemu_mutex_lock_iothread(void);
>>    */
>>   void qemu_mutex_unlock_iothread(void);
>>   
>> +void qemu_mutex_lock_mig_throttle(void);
>> +void qemu_mutex_unlock_mig_throttle(void);
>> +
>>   /* internal interfaces */
>>   
>>   void qemu_fd_register(int fd);
>> diff --git a/kvm-all.c b/kvm-all.c
>> index 2d92721..a92cb77 100644
>> --- a/kvm-all.c
>> +++ b/kvm-all.c
>> @@ -33,6 +33,8 @@
>>   #include "exec/memory.h"
>>   #include "exec/address-spaces.h"
>>   #include "qemu/event_notifier.h"
>> +#include "sysemu/cpus.h"
>> +#include "migration/migration.h"
>>   
>>   /* This check must be after config-host.h is included */
>>   #ifdef CONFIG_EVENTFD
>> @@ -116,6 +118,8 @@ static const KVMCapabilityInfo kvm_required_capabilites[] = {
>>       KVM_CAP_LAST_INFO
>>   };
>>   
>> +static void mig_delay_vcpu(void);
>> +
> move function definiton to here?
Ok.
>> +
>> +static bool throttling;
>> +bool throttling_now(void)
>> +{
>> +    if (throttling) {
>> +        return true;
>> +    }
>> +    return false;
>    return throttling;
>
>> +/* Stub used for getting the vcpu out of VM and into qemu via
>> +   run_on_cpu()*/
>> +static void mig_kick_cpu(void *opq)
>> +{
>> +    return;
>> +}
>> +
>> +/* To reduce the dirty rate explicitly disallow the VCPUs from spending
>> +   much time in the VM. The migration thread will try to catchup.
>> +   Workload will experience a greater performance drop but for a shorter
>> +   duration.
>> +*/
>> +void *migration_throttle_down(void *opaque)
>> +{
>> +    throttling = true;
>> +    while (throttling_needed()) {
>> +        CPUArchState *penv = first_cpu;
> I am not sure that we can follow the list without the iothread lock
> here.

Hmm.. Is this due to vcpu hot plug that might happen at the time of live 
migration (or) due
to something else ? I was trying to avoid holding the iothread lock for 
longer duration and slow
down the migration thread...

>
>> +        while (penv) {
>> +            qemu_mutex_lock_iothread();
>> +            run_on_cpu(ENV_GET_CPU(penv), mig_kick_cpu, NULL);
>> +            qemu_mutex_unlock_iothread();
>> +            penv = penv->next_cpu;
>> +        }
>> +        g_usleep(25*1000);
>> +    }
>> +    throttling = false;
>> +    return NULL;
>> +}
> .
Thanks
Vinod

  reply	other threads:[~2013-04-30 16:14 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-27 20:50 [Qemu-devel] [RFC PATCH v2] Throttle-down guest when live migration does not converge Chegu Vinod
2013-04-29 14:53 ` Eric Blake
2013-04-29 17:48   ` Chegu Vinod
2013-04-30 15:04 ` Orit Wasserman
2013-04-30 17:51   ` Chegu Vinod
2013-04-30 15:20 ` Juan Quintela
2013-04-30 15:55   ` Chegu Vinod [this message]
2013-04-30 16:01     ` Juan Quintela
2013-04-30 17:48       ` Chegu Vinod

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=517FE964.2050702@hp.com \
    --to=chegu_vinod@hp.com \
    --cc=anthony@codemonkey.ws \
    --cc=owasserm@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).