qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [RFC v1 0/2] migration: reduce bitmap sync time and make dirty pages converge much more easily
@ 2025-12-08 12:09 Chuang Xu
  2025-12-08 12:09 ` [RFC v1 1/2] vhost: eliminate duplicate dirty_bitmap sync when log shared by multiple devices Chuang Xu
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Chuang Xu @ 2025-12-08 12:09 UTC (permalink / raw)
  To: qemu-devel
  Cc: mst, sgarzare, richard.henderson, pbonzini, peterx, david, philmd,
	farosas

In our long-term experience in Bytedance, we've found that under the same load,
live migration of larger VMs with more devices is often more difficult to
converge (requiring a larger downtime limit).

We've observed that the live migration bandwidth of large, multi-device VMs is
severely distorted, a phenomenon likely similar to the problem described in this link
(https://wiki.qemu.org/ToDo/LiveMigration#Optimize_migration_bandwidth_calculation).

Through some testing and calculations, we conclude that bitmap sync time affects
the calculation of live migration bandwidth.

Now, let me use formulaic reasoning to illustrate the relationship between the downtime
limit required to achieve the stop conditions and the bitmap sync time.

Assume the actual live migration bandwidth is B, the dirty page rate is D,
the bitmap sync time is x (ms), the transfer time per iteration is t (ms), and the
downtime limit is y (ms).

To simplify the calculation, we assume all of dirty pages are not zero page and only
consider the case B > D.

When x + t > 100ms, the bandwidth calculated by qemu is R = B * t / (x + t).
When x + t < 100ms, the bandwidth calculated by qemu is R = B * (100 - x) / 100.

If there is a critical convergence state, then we have:
  (1) B * t = D * (x + t)
  (2) t = D * x / (B - D)
For the stop condition to be successfully determined, then we have two cases:
  When:
  (3) x + t > 100
  (4) x + D * x / (B - D) > 100
  (5) x > 100 - 100 * D / B
  Then:
  (6) R * y > D * (x + t)
  (7) B * t * y / (x + t) > D * (x + t)
  (8) (B * (D * x / (B - D)) * y) / (x + D * x / (B - D)) > D * (x + D * x / (B - D))
  (9) D * y > D * (x + D * x / (B - D))
  (10) y > x + D * x / (B - D)
  (11) (B - D) * y > B * x
  (12) y > B * x / (B - D)
  
  When:
  (13) x + t < 100
  (14) x + D * x / (B - D) < 100
  (15) x < 100 - 100 * D / B
  Then:
  (16) R * y > D * (x + t)
  (17) B * (100 - x) * y / 100 > D * (x + t)
  (18) B * (100 - x) * y / 100 > D * (x + D * x / (B - D))
  (19) y > 100 * D * x / ((B - D) * (100 - x))

After deriving the formula, we can use some data for comparison.

For a 64C256G vm with 8 vhost-user-net(32 queue per nic) and 16 vhost-user-blk(4 queue per blk),
the sync time is as high as 250ms, while after applying this patch, the sync time is only 10ms.

*First case, assume our maximum bandwidth can reach 15GBps and the dirty page rate is 7.5GBps.

If x = 250 ms, when there is a critical convergence state,
we use formula(2) get t = D * x / (B - D) = 250 ms,
because x + t = 500ms > 100ms,
so we get y > B * x / (B - D) = 500ms.

If x = 10 ms,
when there is a critical convergence state,
we use formula(2) get t = D * x / (B - D) = 10 ms,
because x + t = 20ms < 100ms,
so we get y > 100 * D * x / ((B - D) * (100 - x)) = 11.1ms.

We can see that after optimization, under the same bandwidth and dirty rate scenario,
the downtime limit required for dirty page convergence is significantly reduced.

*Second case, assume our maximum bandwidth can reach 15GBps and the downtime limit is set to 300ms.
If x = 250 ms,  x + t > 250ms > 100ms,
so we use formula(12) get D < B * (y - x) / y = 15 * (300 - 250) / 300 = 2.5GBps

If x = 10 ms,
when x + t > 100ms,
we use formula(12) get D < B * (y - x) / y = 15 * (300 - 10) / 300 = 14.5GBps,
when x + t < 100ms,
we use formula(19) get D < 14.46GBps

We can see that after optimization, under the same bandwidth and downtime limit scenario,
the convergent dirty page rate is significantly improved.

Through the above formula derivation, we have proven that reducing bitmap sync time
can significantly improve dirty page convergence capability.

This series only optimize bitmap sync time for some scenarios.
There may still be many scenarios where bitmap sync time negatively impacts dirty page
convergence capability, and we can also try to optimize using this approach.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-12-10 19:46 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-08 12:09 [RFC v1 0/2] migration: reduce bitmap sync time and make dirty pages converge much more easily Chuang Xu
2025-12-08 12:09 ` [RFC v1 1/2] vhost: eliminate duplicate dirty_bitmap sync when log shared by multiple devices Chuang Xu
2025-12-09 20:47   ` Peter Xu
2025-12-10  6:51     ` Jason Wang
2025-12-10 13:52       ` Chuang Xu
2025-12-08 12:09 ` [RFC v1 2/2] migration: merge fragmented clear_dirty ioctls Chuang Xu
2025-12-09 21:10   ` Peter Xu
2025-12-10 14:18     ` Chuang Xu
2025-12-10 19:45       ` Peter Xu
2025-12-09 21:06 ` [RFC v1 0/2] migration: reduce bitmap sync time and make dirty pages converge much more easily Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).