Re: [Qemu-devel] [PATCH] migration: optimize the downtime

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jay Zhou <jianjay.zhou@huawei.com>,
	rkrcmar@redhat.com, qemu-devel@nongnu.org, quintela@redhat.com,
	armbru@redhat.com, arei.gonglei@huawei.com,
	zhang.zhanghailiang@huawei.com, wangxinxin.wang@huawei.com,
	weidong.huang@huawei.com, aarcange@redhat.com,
	Xiao Guangrong <xiaoguangrong@tencent.com>
Subject: Re: [Qemu-devel] [PATCH] migration: optimize the downtime
Date: Tue, 25 Jul 2017 20:15:42 +0100	[thread overview]
Message-ID: <20170725191542.GD2820@work-vm> (raw)
In-Reply-To: <a05bf0cb-77a3-c372-5b9d-d9c15a8bd8a1@redhat.com>

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> On 24/07/2017 21:03, Dr. David Alan Gilbert wrote:
> >> I don't like having such a long-lived mutex (it seems like a recipe for
> >> deadlocks with the BQL), plus memory_region_transaction_commit (the
> >> expensive part of memory_global_dirty_log_stop) needs to be under the
> >> BQL itself because it calls MemoryListeners.
> >>
> >> Maybe memory_global_dirty_log_stop can delay itself to the next vm_start
> >> if it's called while runstate_running() returns false (which should be
> >> always the case)?
> >>
> >> It could even be entirely enclosed within memory.c if you do it with a
> >> VMChangeStateHandler.
> > This still causes the BQL to be held for quite a while; albeit at a less
> > critical point.
> > 
> > In this and the existing case we don't actually need efficiency - what we need is just
> > not to be holding onto the BQL for so long;  could we do a less
> > efficient commit here, removing one region at a time, yielding the lock
> > and retaking it?
> 
> I'm worried that if you release the lock and retake it you open a window
> where someone else could start a commit, and
> memory_global_dirty_log_stop's commit would use outdated information.
> 
> The problem is that a coarse lock, such as the BQL, is very hard to
> break into another lock with long critical sections (a "memory map
> commit lock" in this case).  It's obviously very easy to say "the new
> lock nests inside the BQL"... but then, because you created the lock to
> move things out of the BQL, any callback inside the new lock cannot take
> the memory map commit lock, and you need to make entire subsystems
> thread-safe.  If the new lock nests outside the BQL instead, it's
> simpler to avoid deadlocks, but you need to be careful about releasing
> the BQL and any reentrancy implications of doing that.
> 
> One strategy that worked well so far has been to use two locks, one of
> them being the BQL, where you can take either lock for reads and both
> for writes.

I guess you could say that in the event someone wants to do something
with the memory map they have to wait for the cleanup to complete?

From a bit of stracing though, it looks like even this isn't enough;
with a 70G VM:
we've got a worst case of about 150ms for any one region:
40430@1501009325.613214:kvm_log_stop_entry
40430@1501009325.760481:kvm_log_stop_exit

of which the actual ioctl is:
39925      0.000042 ioctl(11, KVM_SET_USER_MEMORY_REGION <unfinished ...>
...
39925      0.099510 <... ioctl resumed> , 0x7fb3d11fe660) = 0

about 100ms.

but the whole chunk of them is:

40430@1501009325.451807:kvm_log_stop_entry
40430@1501009325.451893:kvm_log_stop_exit
40430@1501009325.451899:kvm_log_stop_entry
.....<lots more>....
40430@1501009325.613214:kvm_log_stop_entry
40430@1501009325.760481:kvm_log_stop_exit

so that's about 310ms.

So:
  a) either way we've got 100ms in one syscall even if we could chop
this into separate operations; so it's still depressingly long.
  b) Is there anything we can do in the kernel to make it less
  expensive?
  c) Could we build a list of KVM_SET_USER_MEMORY_REGION calls
  that were needed, and perform these in the background
  while letting the rest of the world carry on?  Any changes
  cause us to have to wait for completion.
  d) If this is still something that Jay sees on the new libvirt
  then we need to do something; so doing it on runtype change seems
  OKish - but there again ~300ms to restart the VM if the migration
  fails is objectionable as well.

Dave

> Paolo
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

next prev parent reply	other threads:[~2017-07-25 19:15 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-20  3:49 [Qemu-devel] [PATCH] migration: optimize the downtime Jay Zhou
2017-07-20  4:23 ` no-reply
2017-07-21  9:49 ` Dr. David Alan Gilbert
2017-07-21 12:23   ` Jay Zhou
2017-07-24 15:35     ` Dr. David Alan Gilbert
2017-07-24 16:33       ` Paolo Bonzini
2017-07-24 19:03         ` Dr. David Alan Gilbert
2017-07-24 20:38           ` Paolo Bonzini
2017-07-25 19:15             ` Dr. David Alan Gilbert [this message]
2017-07-27 14:26               ` Paolo Bonzini
2017-07-25  7:29         ` Jay Zhou
2017-07-25  8:18           ` Paolo Bonzini
2017-07-25  7:09       ` Jay Zhou
2017-07-25 10:34         ` Dr. David Alan Gilbert
2017-07-31  7:04           ` Jay Zhou
2017-07-31 13:33             ` Dr. David Alan Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170725191542.GD2820@work-vm \
    --to=dgilbert@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=arei.gonglei@huawei.com \
    --cc=armbru@redhat.com \
    --cc=jianjay.zhou@huawei.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=rkrcmar@redhat.com \
    --cc=wangxinxin.wang@huawei.com \
    --cc=weidong.huang@huawei.com \
    --cc=xiaoguangrong@tencent.com \
    --cc=zhang.zhanghailiang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.