From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jay Zhou <jianjay.zhou@huawei.com>,
rkrcmar@redhat.com, qemu-devel@nongnu.org, quintela@redhat.com,
armbru@redhat.com, arei.gonglei@huawei.com,
zhang.zhanghailiang@huawei.com, wangxinxin.wang@huawei.com,
weidong.huang@huawei.com, aarcange@redhat.com,
Xiao Guangrong <xiaoguangrong@tencent.com>
Subject: Re: [Qemu-devel] [PATCH] migration: optimize the downtime
Date: Mon, 24 Jul 2017 20:03:32 +0100 [thread overview]
Message-ID: <20170724190331.GH2127@work-vm> (raw)
In-Reply-To: <9cea170f-90eb-78e5-6b52-8dd882efd160@redhat.com>
* Paolo Bonzini (pbonzini@redhat.com) wrote:
> On 24/07/2017 17:35, Dr. David Alan Gilbert wrote:
> > * Jay Zhou (jianjay.zhou@huawei.com) wrote:
> >> Hi Dave,
> >>
> >> On 2017/7/21 17:49, Dr. David Alan Gilbert wrote:
> >>> * Jay Zhou (jianjay.zhou@huawei.com) wrote:
> >>>> Qemu_savevm_state_cleanup() takes about 300ms in my ram migration tests
> >>>> with a 8U24G vm(20G is really occupied), the main cost comes from
> >>>> KVM_SET_USER_MEMORY_REGION ioctl when mem.memory_size = 0 in
> >>>> kvm_set_user_memory_region(). In kmod, the main cost is
> >>>> kvm_zap_obsolete_pages(), which traverses the active_mmu_pages list to
> >>>> zap the unsync sptes.
> >>>
> >>> Hi Jay,
> >>> Is this actually increasing the real downtime when the guest isn't
> >>> running, or is it just the reported time? I see that the s->downtime
> >>> value is calculated right after where we currently call
> >>> qemu_savevm_state_cleanup.
> >>
> >> It actually increased the real downtime, I used the "ping" command to
> >> test. Reason is that the source side libvirt sends qmp to qemu to query
> >> the status of migration, which needs the BQL. qemu_savevm_state_cleanup
> >> is done with BQL, qemu can not handle the qmp if qemu_savevm_state_cleanup
> >> has not finished. And the source side libvirt delays about 300ms to notify
> >> the destination side libvirt to send the "cont" command to start the vm.
> >>
> >> I think the value of s->downtime is not accurate enough, maybe we could
> >> move the calculation of end_time after qemu_savevm_state_cleanup has done.
> >
> > I'm copying in Paolo, Radim and Andrea- is there anyway we can make the
> > teardown of KVMs dirty tracking not take so long? 300ms is a silly long time
> > on only a small VM.
>
> Xiao Guangrong is working on something vaguely related (but different
> and simpler because it's entirely contained within KVM), which is to
> make log_sync faster.
>
> The Intel folks working on clear containers also would like
> MemoryListeners to have a better complexity, but that's again separate
> from the "zapping" of SPTEs.
They do keep popping up; I remember they're a pain in COLO.
> > Can you tell me which version of libvirt you're using?
> > I thought the newer ones were supposed to use events so they did't
> > have to poll qemu.
> >
> > If we move qemu_savevm_state_cleanup is it still safe? Are there
> > some things we're supposed to do at that point which are wrong if
> > we don't.
> >
> > I wonder about something like; take a mutex in
> > memory_global_dirty_log_start, release it in
> > memory_global_dirty_log_stop. Then make ram_save_cleanup start
> > a new thread that does the call to memory_global_dirty_log_stop.
>
> I don't like having such a long-lived mutex (it seems like a recipe for
> deadlocks with the BQL), plus memory_region_transaction_commit (the
> expensive part of memory_global_dirty_log_stop) needs to be under the
> BQL itself because it calls MemoryListeners.
>
> Maybe memory_global_dirty_log_stop can delay itself to the next vm_start
> if it's called while runstate_running() returns false (which should be
> always the case)?
>
> It could even be entirely enclosed within memory.c if you do it with a
> VMChangeStateHandler.
This still causes the BQL to be held for quite a while; albeit at a less
critical point.
In this and the existing case we don't actually need efficiency - what we need is just
not to be holding onto the BQL for so long; could we do a less
efficient commit here, removing one region at a time, yielding the lock
and retaking it?
> Thanks,
>
> Paolo
>
> > Dave
> >
> >> Thanks,
> >> Jay
> >>
> >>> However, we would need to be a bit careful of anything that needs
> >>> cleaning up before the source restarts on failure; I'm not sure of
> >>> the semantics of all the current things wired into save_cleanup.
> >>>
> >>> Dave
> >>>
> >>>
> >>>> Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
> >>>> ---
> >>>> migration/migration.c | 16 +++++++++-------
> >>>> qmp.c | 10 ++++++++++
> >>>> 2 files changed, 19 insertions(+), 7 deletions(-)
> >>>>
> >>>> diff --git a/migration/migration.c b/migration/migration.c
> >>>> index a0db40d..72832be 100644
> >>>> --- a/migration/migration.c
> >>>> +++ b/migration/migration.c
> >>>> @@ -1877,6 +1877,15 @@ static void *migration_thread(void *opaque)
> >>>> if (qemu_file_get_error(s->to_dst_file)) {
> >>>> migrate_set_state(&s->state, current_active_state,
> >>>> MIGRATION_STATUS_FAILED);
> >>>> + /*
> >>>> + * The resource has been allocated by migration will be reused in
> >>>> + * COLO process, so don't release them.
> >>>> + */
> >>>> + if (!enable_colo) {
> >>>> + qemu_mutex_lock_iothread();
> >>>> + qemu_savevm_state_cleanup();
> >>>> + qemu_mutex_unlock_iothread();
> >>>> + }
> >>>> trace_migration_thread_file_err();
> >>>> break;
> >>>> }
> >>>> @@ -1916,13 +1925,6 @@ static void *migration_thread(void *opaque)
> >>>> end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> >>>>
> >>>> qemu_mutex_lock_iothread();
> >>>> - /*
> >>>> - * The resource has been allocated by migration will be reused in COLO
> >>>> - * process, so don't release them.
> >>>> - */
> >>>> - if (!enable_colo) {
> >>>> - qemu_savevm_state_cleanup();
> >>>> - }
> >>>> if (s->state == MIGRATION_STATUS_COMPLETED) {
> >>>> uint64_t transferred_bytes = qemu_ftell(s->to_dst_file);
> >>>> s->total_time = end_time - s->total_time;
> >>>> diff --git a/qmp.c b/qmp.c
> >>>> index b86201e..0e68eaa 100644
> >>>> --- a/qmp.c
> >>>> +++ b/qmp.c
> >>>> @@ -37,6 +37,8 @@
> >>>> #include "qom/object_interfaces.h"
> >>>> #include "hw/mem/pc-dimm.h"
> >>>> #include "hw/acpi/acpi_dev_interface.h"
> >>>> +#include "migration/migration.h"
> >>>> +#include "migration/savevm.h"
> >>>>
> >>>> NameInfo *qmp_query_name(Error **errp)
> >>>> {
> >>>> @@ -200,6 +202,14 @@ void qmp_cont(Error **errp)
> >>>> if (runstate_check(RUN_STATE_INMIGRATE)) {
> >>>> autostart = 1;
> >>>> } else {
> >>>> + /*
> >>>> + * Delay the cleanup to reduce the downtime of migration.
> >>>> + * The resource has been allocated by migration will be reused
> >>>> + * in COLO process, so don't release them.
> >>>> + */
> >>>> + if (runstate_check(RUN_STATE_POSTMIGRATE) && !migrate_colo_enabled()) {
> >>>> + qemu_savevm_state_cleanup();
> >>>> + }
> >>>> vm_start();
> >>>> }
> >>>> }
> >>>> --
> >>>> 1.8.3.1
> >>>>
> >>>>
> >>> --
> >>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >>>
> >>> .
> >>>
> >>
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2017-07-24 19:03 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-20 3:49 [Qemu-devel] [PATCH] migration: optimize the downtime Jay Zhou
2017-07-20 4:23 ` no-reply
2017-07-21 9:49 ` Dr. David Alan Gilbert
2017-07-21 12:23 ` Jay Zhou
2017-07-24 15:35 ` Dr. David Alan Gilbert
2017-07-24 16:33 ` Paolo Bonzini
2017-07-24 19:03 ` Dr. David Alan Gilbert [this message]
2017-07-24 20:38 ` Paolo Bonzini
2017-07-25 19:15 ` Dr. David Alan Gilbert
2017-07-27 14:26 ` Paolo Bonzini
2017-07-25 7:29 ` Jay Zhou
2017-07-25 8:18 ` Paolo Bonzini
2017-07-25 7:09 ` Jay Zhou
2017-07-25 10:34 ` Dr. David Alan Gilbert
2017-07-31 7:04 ` Jay Zhou
2017-07-31 13:33 ` Dr. David Alan Gilbert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170724190331.GH2127@work-vm \
--to=dgilbert@redhat.com \
--cc=aarcange@redhat.com \
--cc=arei.gonglei@huawei.com \
--cc=armbru@redhat.com \
--cc=jianjay.zhou@huawei.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=rkrcmar@redhat.com \
--cc=wangxinxin.wang@huawei.com \
--cc=weidong.huang@huawei.com \
--cc=xiaoguangrong@tencent.com \
--cc=zhang.zhanghailiang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.