qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: "Wangxin (Alexander)" <wangxinxin.wang@huawei.com>, mst@redhat.com
Cc: "Wuchenye \(karot,
	Cloud Infrastructure Service Product Dept\)"
	<wuchenye@huawei.com>,
	"Zhoujian \(jay\)" <jianjay.zhou@huawei.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"quintela@redhat.com" <quintela@redhat.com>
Subject: Re: [RFC]migration: stop/start device at the end of live migration concurrently
Date: Mon, 1 Mar 2021 16:02:23 +0000	[thread overview]
Message-ID: <YD0QD+6IZ2LkNnRN@work-vm> (raw)
In-Reply-To: <c716d92c659149f6bdb00c9aa642abf9@huawei.com>

* Wangxin (Alexander) (wangxinxin.wang@huawei.com) wrote:
> Hi all,

(copying in Michael for vhost user maintainer).

> We found that the downtime of migration will reach a few seconds when live
> migrating a huge VM with 224vCPU/180GiB/16 vhost-user nics (x32 queues)/
> 24 vhost-user-blk disks(x4 queues), most of the time is spent in the
> position of stopping the device at src and starting device at dst.

I suspect that's more vhost-user devices than anyone else has run on a
single VM!

> Our idea is to stop the device through multiple threads during the end of
> migration. To be more specific, we create thread pool at the beginning of live
> migraion, when migration thread call virtio_vmstate_change callback to stop or
> start device in vm_state_notify, it will submits request to thread pool to
> handle the callback concurrently.
> 
> We live migrate the vm and count the cost time at different stages of
> stopping/starting devices.
> 
>   -       -     -                 Cost: Original    With state change concurrently
>                 get vring base             36ms          18ms
>         disk    disable guest notify       48ms          32ms
>                 disable host notify        300ms         120ms
> Src             get vring base             1376ms        294ms
>         net     disable host notify        1011ms        116ms
>                 disable guest notify       59ms          40ms
>  -       -      -
>                 enable guest notify        310ms         97ms
>         net     set memtable               48ms          20ms
>                 enable host notify         2022ms        114ms 
> Dst             enable host notify         312ms         78ms
>         disk    enable guest notify        32ms          23ms
>                 set memTable               16ms          10ms
> Total Downtime                             5600ms        962ms
> 
> However, there are some side effects:
> 1. When disable host notify or guest notify concurrently, the vm will be crashed
> due to disabling same notify at the different threads, we now add two different lock
> to solve this problem, it is hacking to do so and may be resulting in other problems.
> 
> 2. As the QEMU BQL will be held by migration thread before stopping device in
> migration_completion, there will be deadlock in the following scene:
> migration_thread                              [thread 1]
>   set_up_multithread
>   ...
>   migration_completion()# get QEMU BQL
>     qemu_mutex_lock_iothread()
>     vm_stop_force_state()
>     ...
>       submit stopping device request
>       to thread pool
>                                            virtio_vmstate_change
>                                              virtio_set_status
>                                              ...
>                                                memory_region_transaction_begin
>                                                ...
>                                                  prepare_mmio_access
>                                                    qemu_mutex_iothread_locked()# N
>                                                    qemu_mutex_lock_iothread()# deadlock
> 
> Now we add another lock to replace the BQL in this scene to solve the problem,
> but we think this is not reliable enough and has potential risk that other
> processes will also use the QEMU BQL during the process of stopping device. My
> question is: how to deal with the conflict with QEMU BQL properly.
> 
> Any advice will be appreciated, thanks.

To me it feels like the other way here would be to explicitly split
each of these stages into two; one where it sends the request to the
vhost device and the other it waits for the response from the vhost-user
device;  (i.e. in the vhost_user case after the vhost_user_write but
before the vhost_user_read) - so instead of parallelising everything in
threads, you'd parallelise all of the corresponding operations;
so all of the get_vring_base's happen at the same time.

Michael: Would this make sense as a thing to change VhostOps
get_vring_base and many of the others into two part operations?
(or maybe coroutines with a yield in???)

Dave
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



      reply	other threads:[~2021-03-01 16:04 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-01 15:09 [RFC]migration: stop/start device at the end of live migration concurrently Wangxin (Alexander)
2021-03-01 16:02 ` Dr. David Alan Gilbert [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YD0QD+6IZ2LkNnRN@work-vm \
    --to=dgilbert@redhat.com \
    --cc=jianjay.zhou@huawei.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=wangxinxin.wang@huawei.com \
    --cc=wuchenye@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).