qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: "Jason J. Herne" <jjherne@linux.vnet.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>,
	"qemu-devel@nongnu.org qemu-devel" <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] Migration auto-converge problem
Date: Thu, 12 Mar 2015 10:32:23 +0000	[thread overview]
Message-ID: <20150312103218.GD2330@work-vm> (raw)
In-Reply-To: <54F4D076.3040402@linux.vnet.ibm.com>

* Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote:
> We have a test case that dirties memory very very quickly. When we run this
> test case in a guest and attempt a migration, that migration never converges
> even when done with auto-converge on.
> 
> The auto converge behavior of Qemu functions differently purpose than I had
> expected. In my mind, I expected auto converge to continuously apply
> adaptive throttling of the cpu utilization of a busy guest if Qemu detects
> that progress is not being made quickly enough in the guest memory transfer.

Yes; it's not as 'auto' as you might hope for an 'auto converge'.

> The idea is that a guest dirtying pages too quickly will be adaptively
> slowed down by the throttling until migration is able to transfer pages fast
> enough to complete the migration within the max downtime. Qemu's current
> auto converge does not appear to do this in practice.
> 
> A quick look at the source code shows the following:
> - Autoconverge keeps a counter. This counter is only incremented if, for a
> completed memory pass, the guest is dirtying pages at a rate of 50% (or
> more) of our transfer rate.
> - The counter only increments at most once per pass through memory.
> - The counter must reach 4 before any throttling is done. (a minimum of 4
> memory passes have to occur)
> - Once the counter reaches 4, it is immediately reset to 0, and then
> throttling action is taken.
> - Throttling occurs by doing an async sleep on each guest cpu for 30ms,
> exactly one time.
> 
> Now consider the scenario auto-converge is meant to solve (I think): A guest
> touching lots of memory very quickly. Each pass through memory is going to
> be sending a lot of pages, and thus, taking a decent amount of time to
> complete. If, for every four passes, we are *only* sleeping the guest for
> 30ms, our guest is still going to be able dirty pages faster than we can
> transfer them. We will never catch up because the sleep time relative to
> guest execution time is very very small.

And the problem is that magic numbers vary in usefulness wildly
depending on CPU speed, memory bandwidth, link bandwidth/type, load type and
the phase of the moon.

> Auto converge, as it is implemented today, does not address the problem I
> expect it solve. However, after rapid prototyping a new version of auto
> converge that performs adaptive modeling I've learned something. The
> workload I'm attempting to migrate is actually a pathological case. It is an
> excellent example of why throttling cpu is not always a good method of
> limiting memory access. In this test case we are able to touch over 600 MB
> of pages in 50 ms of continuous execution. In this case, even if I throttle
> the guest to 5% (50ms runtime, 950ms sleep) we still cannot even come close
> to catching up even with a fairly speedy network link (which not every user
> will have).

Right; the worst case is very bad; touching one line in each page can
be done with very little CPU; however you do hope that most real world
apps aren't quite that bad, and if it is that bad then turning XBZRLE on
should help.

> Given the above, I believe that some workloads touch memory too fast and
> we'll never be able to live migrate them with auto-converge. On the lower
> end there are workloads that have a very small/stagnant working set size
> which will be live migratable without the need for auto-converge. Lastly, we
> have "the nebulous middle". These are workloads that would benefit from
> auto-converge because they touch pages too fast for migration to be able to
> deal with them, AND (important conditional here), throttling will(may?)
> actually reduce their rate of page modifications. I would like to try and
> define this "middle" set of workloads.
> 
> A question with no obvious answer: How much throttling is acceptable? If I
> have to throttle a guest 90% and he ends up failing 75% of whatever
> transactions he is attempting to process then we have quite likely defeated
> the entire purpose of "live" migration. Perhaps it would be better in this
> case to just stop the guest and do a non-live migration. Maybe by reverting
> to non-live we actually save time and thus more transactions would have
> completed. This one may take some experimenting to be able to get a good
> idea for what makes the most sense. Maybe even have max throttling be be
> user configurable.

But then you could just increase the 'max-downtime' setting that
produces a similar behaviour.  Of course the reality is that users want no
appreciable throttling and instant migration - something has to give.
Postcopy is one answer but it still gives a performance degradation.

> With all this said, I still wonder exactly how big this "nebulous middle"
> really is. If, in practice, that "middle" only accounts for 1% of the
> workloads out there then is it really worth spending time fixing it? Keep in
> mind this is a two pronged test:
> 1. Guest cannot migrate because it changes memory too fast
> 2. Cpu throttling slows guest's memory writes down enough such that he can
> now migrate
> 
> I'm interested in any thoughts anyone has. Thanks!

It doesn't really matter what percentage of the workloads it is - as long
as real world workloads hit (rather than just evil worst cases); 1% still means
that a lot of people are going to hit it; and you can never usefully approximate
just what that percentage is.

I think it's certainly worth improving auto-converge; but as you say it's
very difficult to know just how many people it helps.

Dave

> -- 
> -- Jason J. Herne (jjherne@linux.vnet.ibm.com)
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

      parent reply	other threads:[~2015-03-12 10:32 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-02 21:04 [Qemu-devel] Migration auto-converge problem Jason J. Herne
2015-03-11 19:47 ` Jason J. Herne
2015-03-11 23:23 ` John Snow
2015-03-12 10:32 ` Dr. David Alan Gilbert [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150312103218.GD2330@work-vm \
    --to=dgilbert@redhat.com \
    --cc=borntraeger@de.ibm.com \
    --cc=jjherne@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).