* [RFC]migration: stop/start device at the end of live migration concurrently
@ 2021-03-01 15:09 Wangxin (Alexander)
2021-03-01 16:02 ` Dr. David Alan Gilbert
0 siblings, 1 reply; 2+ messages in thread
From: Wangxin (Alexander) @ 2021-03-01 15:09 UTC (permalink / raw)
To: qemu-devel@nongnu.org
Cc: Wuchenye (karot, Cloud Infrastructure Service Product Dept),
Zhoujian (jay), dgilbert@redhat.com, quintela@redhat.com
Hi all,
We found that the downtime of migration will reach a few seconds when live
migrating a huge VM with 224vCPU/180GiB/16 vhost-user nics (x32 queues)/
24 vhost-user-blk disks(x4 queues), most of the time is spent in the
position of stopping the device at src and starting device at dst.
Our idea is to stop the device through multiple threads during the end of
migration. To be more specific, we create thread pool at the beginning of live
migraion, when migration thread call virtio_vmstate_change callback to stop or
start device in vm_state_notify, it will submits request to thread pool to
handle the callback concurrently.
We live migrate the vm and count the cost time at different stages of
stopping/starting devices.
- - - Cost: Original With state change concurrently
get vring base 36ms 18ms
disk disable guest notify 48ms 32ms
disable host notify 300ms 120ms
Src get vring base 1376ms 294ms
net disable host notify 1011ms 116ms
disable guest notify 59ms 40ms
- - -
enable guest notify 310ms 97ms
net set memtable 48ms 20ms
enable host notify 2022ms 114ms
Dst enable host notify 312ms 78ms
disk enable guest notify 32ms 23ms
set memTable 16ms 10ms
Total Downtime 5600ms 962ms
However, there are some side effects:
1. When disable host notify or guest notify concurrently, the vm will be crashed
due to disabling same notify at the different threads, we now add two different lock
to solve this problem, it is hacking to do so and may be resulting in other problems.
2. As the QEMU BQL will be held by migration thread before stopping device in
migration_completion, there will be deadlock in the following scene:
migration_thread [thread 1]
set_up_multithread
...
migration_completion()# get QEMU BQL
qemu_mutex_lock_iothread()
vm_stop_force_state()
...
submit stopping device request
to thread pool
virtio_vmstate_change
virtio_set_status
...
memory_region_transaction_begin
...
prepare_mmio_access
qemu_mutex_iothread_locked()# N
qemu_mutex_lock_iothread()# deadlock
Now we add another lock to replace the BQL in this scene to solve the problem,
but we think this is not reliable enough and has potential risk that other
processes will also use the QEMU BQL during the process of stopping device. My
question is: how to deal with the conflict with QEMU BQL properly.
Any advice will be appreciated, thanks.
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [RFC]migration: stop/start device at the end of live migration concurrently
2021-03-01 15:09 [RFC]migration: stop/start device at the end of live migration concurrently Wangxin (Alexander)
@ 2021-03-01 16:02 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 2+ messages in thread
From: Dr. David Alan Gilbert @ 2021-03-01 16:02 UTC (permalink / raw)
To: Wangxin (Alexander), mst
Cc: Wuchenye (karot, Cloud Infrastructure Service Product Dept),
Zhoujian (jay), qemu-devel@nongnu.org, quintela@redhat.com
* Wangxin (Alexander) (wangxinxin.wang@huawei.com) wrote:
> Hi all,
(copying in Michael for vhost user maintainer).
> We found that the downtime of migration will reach a few seconds when live
> migrating a huge VM with 224vCPU/180GiB/16 vhost-user nics (x32 queues)/
> 24 vhost-user-blk disks(x4 queues), most of the time is spent in the
> position of stopping the device at src and starting device at dst.
I suspect that's more vhost-user devices than anyone else has run on a
single VM!
> Our idea is to stop the device through multiple threads during the end of
> migration. To be more specific, we create thread pool at the beginning of live
> migraion, when migration thread call virtio_vmstate_change callback to stop or
> start device in vm_state_notify, it will submits request to thread pool to
> handle the callback concurrently.
>
> We live migrate the vm and count the cost time at different stages of
> stopping/starting devices.
>
> - - - Cost: Original With state change concurrently
> get vring base 36ms 18ms
> disk disable guest notify 48ms 32ms
> disable host notify 300ms 120ms
> Src get vring base 1376ms 294ms
> net disable host notify 1011ms 116ms
> disable guest notify 59ms 40ms
> - - -
> enable guest notify 310ms 97ms
> net set memtable 48ms 20ms
> enable host notify 2022ms 114ms
> Dst enable host notify 312ms 78ms
> disk enable guest notify 32ms 23ms
> set memTable 16ms 10ms
> Total Downtime 5600ms 962ms
>
> However, there are some side effects:
> 1. When disable host notify or guest notify concurrently, the vm will be crashed
> due to disabling same notify at the different threads, we now add two different lock
> to solve this problem, it is hacking to do so and may be resulting in other problems.
>
> 2. As the QEMU BQL will be held by migration thread before stopping device in
> migration_completion, there will be deadlock in the following scene:
> migration_thread [thread 1]
> set_up_multithread
> ...
> migration_completion()# get QEMU BQL
> qemu_mutex_lock_iothread()
> vm_stop_force_state()
> ...
> submit stopping device request
> to thread pool
> virtio_vmstate_change
> virtio_set_status
> ...
> memory_region_transaction_begin
> ...
> prepare_mmio_access
> qemu_mutex_iothread_locked()# N
> qemu_mutex_lock_iothread()# deadlock
>
> Now we add another lock to replace the BQL in this scene to solve the problem,
> but we think this is not reliable enough and has potential risk that other
> processes will also use the QEMU BQL during the process of stopping device. My
> question is: how to deal with the conflict with QEMU BQL properly.
>
> Any advice will be appreciated, thanks.
To me it feels like the other way here would be to explicitly split
each of these stages into two; one where it sends the request to the
vhost device and the other it waits for the response from the vhost-user
device; (i.e. in the vhost_user case after the vhost_user_write but
before the vhost_user_read) - so instead of parallelising everything in
threads, you'd parallelise all of the corresponding operations;
so all of the get_vring_base's happen at the same time.
Michael: Would this make sense as a thing to change VhostOps
get_vring_base and many of the others into two part operations?
(or maybe coroutines with a yield in???)
Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2021-03-01 16:04 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-03-01 15:09 [RFC]migration: stop/start device at the end of live migration concurrently Wangxin (Alexander)
2021-03-01 16:02 ` Dr. David Alan Gilbert
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).