Re: [PATCH v2 0/4] Migration time prediction using calc-dirty-rate

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Peter Xu <peterx@redhat.com>
To: gudkov.andrei@huawei.com
Cc: qemu-devel@nongnu.org, quintela@redhat.com, eblake@redhat.com,
	armbru@redhat.com, berrange@redhat.com, zhengchuan@huawei.com
Subject: Re: [PATCH v2 0/4] Migration time prediction using calc-dirty-rate
Date: Wed, 31 May 2023 11:03:37 -0400	[thread overview]
Message-ID: <ZHdhycvhn/lWJQxy@x1n> (raw)
In-Reply-To: <ZHdd0BDefsv02SWX@DESKTOP-0LHM7NF.china.huawei.com>

On Wed, May 31, 2023 at 05:46:40PM +0300, gudkov.andrei@huawei.com wrote:
> On Tue, May 30, 2023 at 11:46:50AM -0400, Peter Xu wrote:
> > Hi, Andrei,
> > 
> > On Thu, Apr 27, 2023 at 03:42:56PM +0300, Andrei Gudkov via wrote:
> > > Afterwards we tried to migrate VM after randomly selecting max downtime
> > > and bandwidth limit. Typical prediction error is 6-7%, with only 180 out
> > > of 5779 experiments failing badly: prediction error >=25% or incorrectly
> > > predicting migration success when in fact it didn't converge.
> > 
> > What's the normal size of the VMs when you did the measurements?
> 
> VM size in all experiments was 32GiB. However, since some of the pages
> are zero, the effective VM size was smaller. I checked the value of
> precopy-bytes counter after the first migration iteration. Median value
> among all experiments is 24.3GiB.
> 
> > 
> > A major challenge of convergence issues come from huge VMs and I'm
> > wondering whether those are covered in the prediction verifications.
> 
> Hmmm... My understanding is that convergence primarily depends on how
> agressive VM dirties pages and not on VM size. Small VM with agressive
> writes would be impossible to migrate without throttling. On the contrary,
> migration of the huge dormant VM will converge in just single iteration
> (although a long one). The only reason I can imagine why large VM size can
> negatively affect convergence is due to the following reasoning: larger VM
> size => bigger number of vCPUs => more memory writes per second.
> Or do you probably mean that during each iteration we perform
> KVM_CLEAR_DIRTY_LOG, which is (I suspect) linear in time and can become
> bottleneck for large VMs?

Partly yes, but not explicitly to CLEAR_LOG, more to the whole process that
may still be relevant to size of guest memory, and I was curious whether it
can keep being accurate even if mem size grows.

I assume huge VM normally should have more cores too, and it's even less
likely to be idle if there's a real customer using it (rather than in labs,
if I'm a huge VM tenant I won't want to make it idle anytime).  Then with
more cores there's definitely more chance of having higher dirty rates,
especially with the larger mem pool.

> Anyway, I will conduct experiments with large VMs.

Thanks.

> 
> I think that the easiest way to predict whether VM migration will converge
> or not is the following. Run calc-dirty-rate with calc-time equal to
> desired downtime. If it reports that the volume of dirtied memory over
> calc-time period is larger than you can copy over network in the same time,
> then you are out of luck. Alas, at the current moment calc-time accepts
> values in units of seconds, while reasonable downtime lies in range 50-300ms.
> I am preparing a separate patch that will allow to specify calc-time in
> milliseconds. I hope that this approach will be cleaner than an array of
> hardcoded values I introduced in my original patch.

I actually haven't personally gone through the details of the new
interface, but what you said sounds reasonable, and happy to read the new
version.

-- 
Peter Xu

     prev parent reply	other threads:[~2023-05-31 15:04 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-27 12:42 [PATCH v2 0/4] Migration time prediction using calc-dirty-rate Andrei Gudkov via
2023-04-27 12:42 ` [PATCH v2 1/4] migration/calc-dirty-rate: replaced CRC32 with xxHash Andrei Gudkov via
2023-05-10 16:54   ` Juan Quintela
2023-04-27 12:42 ` [PATCH v2 2/4] migration/calc-dirty-rate: detailed stats in sampling mode Andrei Gudkov via
2023-05-10 17:36   ` Juan Quintela
2023-05-12 13:18     ` gudkov.andrei--- via
2023-05-15  8:22       ` Juan Quintela
2023-05-18 14:45         ` gudkov.andrei--- via
2023-05-18 15:13           ` Juan Quintela
2023-05-15  8:23       ` Juan Quintela
2023-05-11  6:14   ` Markus Armbruster
2023-05-12 14:24     ` gudkov.andrei--- via
2023-05-30  3:06   ` Wang, Lei
2023-04-27 12:42 ` [PATCH v2 3/4] migration/calc-dirty-rate: added n-zero-pages metric Andrei Gudkov via
2023-05-10 17:57   ` Juan Quintela
2023-04-27 12:43 ` [PATCH v2 4/4] migration/calc-dirty-rate: tool to predict migration time Andrei Gudkov via
2023-05-10 18:01   ` Juan Quintela
2023-05-30  3:21   ` Wang, Lei
2023-06-02 13:06     ` gudkov.andrei--- via
2023-05-30 15:46 ` [PATCH v2 0/4] Migration time prediction using calc-dirty-rate Peter Xu
2023-05-31 14:46   ` gudkov.andrei--- via
2023-05-31 15:03     ` Peter Xu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZHdhycvhn/lWJQxy@x1n \
    --to=peterx@redhat.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=eblake@redhat.com \
    --cc=gudkov.andrei@huawei.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=zhengchuan@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).