qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [RFC v5 0/3] migration: reduce time of loading non-iterable vmstate
@ 2023-01-17 11:55 Chuang Xu
  2023-01-17 11:55 ` [RFC v5 1/3] rcu: introduce rcu_read_is_locked() Chuang Xu
                   ` (7 more replies)
  0 siblings, 8 replies; 28+ messages in thread
From: Chuang Xu @ 2023-01-17 11:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgilbert, quintela, pbonzini, peterx, david, philmd, zhouyibo

In this version:

- rename rcu_read_locked() to rcu_read_is_locked().
- adjust the sanity check in address_space_to_flatview().
- improve some comments.

The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of	
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Here are the test1 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 8 16-queue vhost-net device
  - 16 4-queue vhost-user-blk device.

	time of loading non-iterable vmstate     downtime
before		about 150 ms			  740+ ms
after		about 30 ms			  630+ ms

(This result is different from that of v1. It may be that someone has 
changed something on my host.., but it does not affect the display of 
the optimization effect.)


In test2, we keep the number of the device the same as test1, reduce the 
number of queues per device:

Here are the test2 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 8 1-queue vhost-net device
  - 16 1-queue vhost-user-blk device.

	time of loading non-iterable vmstate     downtime
before		about 90 ms			 about 250 ms

after		about 25 ms			 about 160 ms



In test3, we keep the number of queues per device the same as test1, reduce 
the number of devices:

Here are the test3 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 1 16-queue vhost-net device
  - 1 4-queue vhost-user-blk device.

	time of loading non-iterable vmstate     downtime
before		about 20 ms			 about 70 ms
after		about 11 ms			 about 60 ms


As we can see from the test results above, both the number of queues and 
the number of devices have a great impact on the time of loading non-iterable 
vmstate. The growth of the number of devices and queues will lead to more 
mr commits, and the time consumption caused by the flatview reconstruction 
will also increase.

Please review, Chuang.

[v4]

- attach more information in the cover letter.
- remove changes on virtio_load.
- add rcu_read_locked() to detect holding of rcu lock.

[v3]

- move virtio_load_check_delay() from virtio_memory_listener_commit() to 
  virtio_vmstate_change().
- add delay_check flag to VirtIODevice to make sure virtio_load_check_delay() 
  will be called when delay_check is true.

[v2]

- rebase to latest upstream.
- add sanity check to address_space_to_flatview().
- postpone the init of the vring cache until migration's loading completes. 

[v1]

The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Here are the test results:
test vm info:
- 32 CPUs 128GB RAM
- 8 16-queue vhost-net device
- 16 4-queue vhost-user-blk device.

	time of loading non-iterable vmstate
before		about 210 ms
after		about 40 ms



^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2023-02-27 20:58 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-01-17 11:55 [RFC v5 0/3] migration: reduce time of loading non-iterable vmstate Chuang Xu
2023-01-17 11:55 ` [RFC v5 1/3] rcu: introduce rcu_read_is_locked() Chuang Xu
2023-02-02 10:59   ` Juan Quintela
2023-02-14  7:57     ` Chuang Xu
2023-01-17 11:55 ` [RFC v5 2/3] memory: add depth assert in address_space_to_flatview Chuang Xu
2023-02-08 19:31   ` Juan Quintela
2023-01-17 11:55 ` [RFC v5 3/3] migration: reduce time of loading non-iterable vmstate Chuang Xu
2023-02-02 11:01   ` Juan Quintela
2023-01-17 15:41 ` [RFC v5 0/3] " Peter Xu
2023-02-02 11:07 ` Juan Quintela
2023-02-15 17:00 ` Juan Quintela
2023-02-15 17:06 ` Claudio Fontana
2023-02-15 19:10 ` Juan Quintela
2023-02-16 15:41   ` Chuang Xu
     [not found]   ` <a555b989-27be-006e-0d00-9f1688c5be4e@bytedance.com>
2023-02-17  8:11     ` Chuang Xu
2023-02-17 15:52       ` Peter Xu
2023-02-20 13:36         ` Chuang Xu
2023-02-21  3:38         ` Chuang Xu
2023-02-21  8:57           ` Chuang Xu
2023-02-21 20:36             ` Peter Xu
2023-02-22  6:27               ` Chuang Xu
2023-02-22 15:57                 ` Peter Xu
2023-02-23  3:28                   ` Chuang Xu
2023-02-25 15:32                     ` Peter Xu
2023-02-27 13:19                       ` Chuang Xu
2023-02-27 20:56                         ` Peter Xu
2023-02-20  9:53   ` Chuang Xu
2023-02-20 12:07     ` Juan Quintela

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).