Re: [Qemu-devel] [RFC PATCH v2] Throttle-down guest when live migration does not converge.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Chegu Vinod <chegu_vinod@hp.com>
To: quintela@redhat.com
Cc: pbonzini@redhat.com, qemu-devel@nongnu.org,
	anthony@codemonkey.ws, owasserm@redhat.com
Subject: Re: [Qemu-devel] [RFC PATCH v2] Throttle-down guest when live migration does not converge.
Date: Tue, 30 Apr 2013 08:55:16 -0700	[thread overview]
Message-ID: <517FE964.2050702@hp.com> (raw)
In-Reply-To: <877gjkm4ds.fsf@elfo.elfo>

On 4/30/2013 8:20 AM, Juan Quintela wrote:
> Chegu Vinod <chegu_vinod@hp.com> wrote:
>> Busy enterprise workloads hosted on large sized VM's tend to dirty
>> memory faster than the transfer rate achieved via live guest migration.
>> Despite some good recent improvements (& using dedicated 10Gig NICs
>> between hosts) the live migration does NOT converge.
>>
>> A few options that were discussed/being-pursued to help with
>> the convergence issue include:
>>
>> 1) Slow down guest considerably via cgroup's CPU controls - requires
>>     libvirt client support to detect & trigger action, but conceptually
>>     similar to this RFC change.
>>
>> 2) Speed up transfer rate:
>>     - RDMA based Pre-copy - lower overhead and fast (Unfortunately
>>       has a few restrictions and some customers still choose not
>>       to deploy RDMA :-( ).
>>     - Add parallelism to improve transfer rate and use multiple 10Gig
>>       connections (bonded). - could add some overhead on the host.
>>
>> 3) Post-copy (preferably with RDMA) or a Pre+Post copy hybrid - Sounds
>>     promising but need to consider & handle newer failure scenarios.
>>
>> If an enterprise user chooses to force convergence of their migration
>> via the new capability "auto-converge" then with this change we auto-detect
>> lack of convergence scenario and trigger a slow down of the workload
>> by explicitly disallowing the VCPUs from spending much time in the VM
>> context.
>>
>> The migration thread tries to catchup and this eventually leads
>> to convergence in some "deterministic" amount of time. Yes it does
>> impact the performance of all the VCPUs but in my observation that
>> lasts only for a short duration of time. i.e. we end up entering
>> stage 3 (downtime phase) soon after that.
>>
>> No exernal trigger is required (unlike option 1) and it can co-exist
>> with enhancements being pursued as part of Option 2 (e.g. RDMA).
>>
>> Thanks to Juan and Paolo for their useful suggestions.
>>
>> Verified the convergence using the following :
>> - SpecJbb2005 workload running on a 20VCPU/256G guest(~80% busy)
>> - OLTP like workload running on a 80VCPU/512G guest (~80% busy)
>>
>> Sample results with SpecJbb2005 workload : (migrate speed set to 20Gb and
>> migrate downtime set to 4seconds).
>>
>> (qemu) info migrate
>> capabilities: xbzrle: off auto-converge: off  <----
>> Migration status: active
>> total time: 1487503 milliseconds
> 148 seconds

1487 seconds and still the Migration is not completed.

>
>> expected downtime: 519 milliseconds
>> transferred ram: 383749347 kbytes
>> remaining ram: 2753372 kbytes
>> total ram: 268444224 kbytes
>> duplicate: 65461532 pages
>> skipped: 64901568 pages
>> normal: 95750218 pages
>> normal bytes: 383000872 kbytes
>> dirty pages rate: 67551 pages
>>
>> ---
>>
>> (qemu) info migrate
>> capabilities: xbzrle: off auto-converge: on   <----
>> Migration status: completed
>> total time: 241161 milliseconds
>> downtime: 6373 milliseconds
> 6.3 seconds and finished,  not bad at all O:-)
That's the *downtime*..  The total time for migration to complete is  
241 secs. (SpecJBB is
one of those workloads that dirties memory quite a bit).

>
> How much does the guest throughput drops while we enter autoconverge mode?

Workload performance drops for some short duration and it...but it soon 
switches to stage 3.

>
>> transferred ram: 28235307 kbytes
>> remaining ram: 0 kbytes
>> total ram: 268444224 kbytes
>> duplicate: 64946416 pages
>> skipped: 64903523 pages
>> normal: 7044971 pages
>> normal bytes: 28179884 kbytes
>>
>> Changes from v1:
>> - rebased to latest qemu.git
>> - added auto-converge capability(default off) - suggested by Anthony Liguori &
>>                                                  Eric Blake.
>>
>> Signed-off-by: Chegu Vinod <chegu_vinod@hp.com>
>> @@ -379,12 +380,20 @@ static void migration_bitmap_sync(void)
>>       MigrationState *s = migrate_get_current();
>>       static int64_t start_time;
>>       static int64_t num_dirty_pages_period;
>> +    static int64_t bytes_xfer_prev;
>>       int64_t end_time;
>> +    int64_t bytes_xfer_now;
>> +    static int dirty_rate_high_cnt;
>> +
>> +    if (migrate_auto_converge() && !bytes_xfer_prev) {
> Just do the !bytes_xfer_prev test here?  migrate_autoconverge is more
> expensive to call that just do the assignment?

Sure
>
>> +
>> +    if (value) {
>> +        return true;
>> +    }
>> +    return false;
> this code is just:
>
> return value;

ok

>
>> diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
>> index 6f0200a..9a3886d 100644
>> --- a/include/qemu/main-loop.h
>> +++ b/include/qemu/main-loop.h
>> @@ -299,6 +299,9 @@ void qemu_mutex_lock_iothread(void);
>>    */
>>   void qemu_mutex_unlock_iothread(void);
>>   
>> +void qemu_mutex_lock_mig_throttle(void);
>> +void qemu_mutex_unlock_mig_throttle(void);
>> +
>>   /* internal interfaces */
>>   
>>   void qemu_fd_register(int fd);
>> diff --git a/kvm-all.c b/kvm-all.c
>> index 2d92721..a92cb77 100644
>> --- a/kvm-all.c
>> +++ b/kvm-all.c
>> @@ -33,6 +33,8 @@
>>   #include "exec/memory.h"
>>   #include "exec/address-spaces.h"
>>   #include "qemu/event_notifier.h"
>> +#include "sysemu/cpus.h"
>> +#include "migration/migration.h"
>>   
>>   /* This check must be after config-host.h is included */
>>   #ifdef CONFIG_EVENTFD
>> @@ -116,6 +118,8 @@ static const KVMCapabilityInfo kvm_required_capabilites[] = {
>>       KVM_CAP_LAST_INFO
>>   };
>>   
>> +static void mig_delay_vcpu(void);
>> +
> move function definiton to here?
Ok.
>> +
>> +static bool throttling;
>> +bool throttling_now(void)
>> +{
>> +    if (throttling) {
>> +        return true;
>> +    }
>> +    return false;
>    return throttling;
>
>> +/* Stub used for getting the vcpu out of VM and into qemu via
>> +   run_on_cpu()*/
>> +static void mig_kick_cpu(void *opq)
>> +{
>> +    return;
>> +}
>> +
>> +/* To reduce the dirty rate explicitly disallow the VCPUs from spending
>> +   much time in the VM. The migration thread will try to catchup.
>> +   Workload will experience a greater performance drop but for a shorter
>> +   duration.
>> +*/
>> +void *migration_throttle_down(void *opaque)
>> +{
>> +    throttling = true;
>> +    while (throttling_needed()) {
>> +        CPUArchState *penv = first_cpu;
> I am not sure that we can follow the list without the iothread lock
> here.

Hmm.. Is this due to vcpu hot plug that might happen at the time of live 
migration (or) due
to something else ? I was trying to avoid holding the iothread lock for 
longer duration and slow
down the migration thread...

>
>> +        while (penv) {
>> +            qemu_mutex_lock_iothread();
>> +            run_on_cpu(ENV_GET_CPU(penv), mig_kick_cpu, NULL);
>> +            qemu_mutex_unlock_iothread();
>> +            penv = penv->next_cpu;
>> +        }
>> +        g_usleep(25*1000);
>> +    }
>> +    throttling = false;
>> +    return NULL;
>> +}
> .
Thanks
Vinod

next prev parent reply	other threads:[~2013-04-30 16:14 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-27 20:50 [Qemu-devel] [RFC PATCH v2] Throttle-down guest when live migration does not converge Chegu Vinod
2013-04-29 14:53 ` Eric Blake
2013-04-29 17:48   ` Chegu Vinod
2013-04-30 15:04 ` Orit Wasserman
2013-04-30 17:51   ` Chegu Vinod
2013-04-30 15:20 ` Juan Quintela
2013-04-30 15:55   ` Chegu Vinod [this message]
2013-04-30 16:01     ` Juan Quintela
2013-04-30 17:48       ` Chegu Vinod

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=517FE964.2050702@hp.com \
    --to=chegu_vinod@hp.com \
    --cc=anthony@codemonkey.ws \
    --cc=owasserm@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.