From: Peter Xu <peterx@redhat.com>
To: gudkov.andrei@huawei.com
Cc: qemu-devel@nongnu.org, quintela@redhat.com, eblake@redhat.com,
armbru@redhat.com, berrange@redhat.com, zhengchuan@huawei.com
Subject: Re: [PATCH v2 0/4] Migration time prediction using calc-dirty-rate
Date: Wed, 31 May 2023 11:03:37 -0400 [thread overview]
Message-ID: <ZHdhycvhn/lWJQxy@x1n> (raw)
In-Reply-To: <ZHdd0BDefsv02SWX@DESKTOP-0LHM7NF.china.huawei.com>
On Wed, May 31, 2023 at 05:46:40PM +0300, gudkov.andrei@huawei.com wrote:
> On Tue, May 30, 2023 at 11:46:50AM -0400, Peter Xu wrote:
> > Hi, Andrei,
> >
> > On Thu, Apr 27, 2023 at 03:42:56PM +0300, Andrei Gudkov via wrote:
> > > Afterwards we tried to migrate VM after randomly selecting max downtime
> > > and bandwidth limit. Typical prediction error is 6-7%, with only 180 out
> > > of 5779 experiments failing badly: prediction error >=25% or incorrectly
> > > predicting migration success when in fact it didn't converge.
> >
> > What's the normal size of the VMs when you did the measurements?
>
> VM size in all experiments was 32GiB. However, since some of the pages
> are zero, the effective VM size was smaller. I checked the value of
> precopy-bytes counter after the first migration iteration. Median value
> among all experiments is 24.3GiB.
>
> >
> > A major challenge of convergence issues come from huge VMs and I'm
> > wondering whether those are covered in the prediction verifications.
>
> Hmmm... My understanding is that convergence primarily depends on how
> agressive VM dirties pages and not on VM size. Small VM with agressive
> writes would be impossible to migrate without throttling. On the contrary,
> migration of the huge dormant VM will converge in just single iteration
> (although a long one). The only reason I can imagine why large VM size can
> negatively affect convergence is due to the following reasoning: larger VM
> size => bigger number of vCPUs => more memory writes per second.
> Or do you probably mean that during each iteration we perform
> KVM_CLEAR_DIRTY_LOG, which is (I suspect) linear in time and can become
> bottleneck for large VMs?
Partly yes, but not explicitly to CLEAR_LOG, more to the whole process that
may still be relevant to size of guest memory, and I was curious whether it
can keep being accurate even if mem size grows.
I assume huge VM normally should have more cores too, and it's even less
likely to be idle if there's a real customer using it (rather than in labs,
if I'm a huge VM tenant I won't want to make it idle anytime). Then with
more cores there's definitely more chance of having higher dirty rates,
especially with the larger mem pool.
> Anyway, I will conduct experiments with large VMs.
Thanks.
>
> I think that the easiest way to predict whether VM migration will converge
> or not is the following. Run calc-dirty-rate with calc-time equal to
> desired downtime. If it reports that the volume of dirtied memory over
> calc-time period is larger than you can copy over network in the same time,
> then you are out of luck. Alas, at the current moment calc-time accepts
> values in units of seconds, while reasonable downtime lies in range 50-300ms.
> I am preparing a separate patch that will allow to specify calc-time in
> milliseconds. I hope that this approach will be cleaner than an array of
> hardcoded values I introduced in my original patch.
I actually haven't personally gone through the details of the new
interface, but what you said sounds reasonable, and happy to read the new
version.
--
Peter Xu
prev parent reply other threads:[~2023-05-31 15:04 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-27 12:42 [PATCH v2 0/4] Migration time prediction using calc-dirty-rate Andrei Gudkov via
2023-04-27 12:42 ` [PATCH v2 1/4] migration/calc-dirty-rate: replaced CRC32 with xxHash Andrei Gudkov via
2023-05-10 16:54 ` Juan Quintela
2023-04-27 12:42 ` [PATCH v2 2/4] migration/calc-dirty-rate: detailed stats in sampling mode Andrei Gudkov via
2023-05-10 17:36 ` Juan Quintela
2023-05-12 13:18 ` gudkov.andrei--- via
2023-05-15 8:22 ` Juan Quintela
2023-05-18 14:45 ` gudkov.andrei--- via
2023-05-18 15:13 ` Juan Quintela
2023-05-15 8:23 ` Juan Quintela
2023-05-11 6:14 ` Markus Armbruster
2023-05-12 14:24 ` gudkov.andrei--- via
2023-05-30 3:06 ` Wang, Lei
2023-04-27 12:42 ` [PATCH v2 3/4] migration/calc-dirty-rate: added n-zero-pages metric Andrei Gudkov via
2023-05-10 17:57 ` Juan Quintela
2023-04-27 12:43 ` [PATCH v2 4/4] migration/calc-dirty-rate: tool to predict migration time Andrei Gudkov via
2023-05-10 18:01 ` Juan Quintela
2023-05-30 3:21 ` Wang, Lei
2023-06-02 13:06 ` gudkov.andrei--- via
2023-05-30 15:46 ` [PATCH v2 0/4] Migration time prediction using calc-dirty-rate Peter Xu
2023-05-31 14:46 ` gudkov.andrei--- via
2023-05-31 15:03 ` Peter Xu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZHdhycvhn/lWJQxy@x1n \
--to=peterx@redhat.com \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=eblake@redhat.com \
--cc=gudkov.andrei@huawei.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=zhengchuan@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).