qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Jason J. Herne" <jjherne@linux.vnet.ibm.com>
To: "qemu-devel@nongnu.org qemu-devel" <qemu-devel@nongnu.org>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	quintela@redhat.com, amit.shah@redhat.com
Subject: Re: [Qemu-devel] Migration auto-converge problem
Date: Wed, 11 Mar 2015 15:47:12 -0400	[thread overview]
Message-ID: <55009BC0.3010905@linux.vnet.ibm.com> (raw)
In-Reply-To: <54F4D076.3040402@linux.vnet.ibm.com>

On 03/02/2015 04:04 PM, Jason J. Herne wrote:
> We have a test case that dirties memory very very quickly. When we run
> this test case in a guest and attempt a migration, that migration never
> converges even when done with auto-converge on.
>
> The auto converge behavior of Qemu functions differently purpose than I
> had expected. In my mind, I expected auto converge to continuously apply
> adaptive throttling of the cpu utilization of a busy guest if Qemu
> detects that progress is not being made quickly enough in the guest
> memory transfer. The idea is that a guest dirtying pages too quickly
> will be adaptively slowed down by the throttling until migration is able
> to transfer pages fast enough to complete the migration within the max
> downtime. Qemu's current auto converge does not appear to do this in
> practice.
>
> A quick look at the source code shows the following:
> - Autoconverge keeps a counter. This counter is only incremented if, for
> a completed memory pass, the guest is dirtying pages at a rate of 50%
> (or more) of our transfer rate.
> - The counter only increments at most once per pass through memory.
> - The counter must reach 4 before any throttling is done. (a minimum of
> 4 memory passes have to occur)
> - Once the counter reaches 4, it is immediately reset to 0, and then
> throttling action is taken.
> - Throttling occurs by doing an async sleep on each guest cpu for 30ms,
> exactly one time.
>
> Now consider the scenario auto-converge is meant to solve (I think): A
> guest touching lots of memory very quickly. Each pass through memory is
> going to be sending a lot of pages, and thus, taking a decent amount of
> time to complete. If, for every four passes, we are *only* sleeping the
> guest for 30ms, our guest is still going to be able dirty pages faster
> than we can transfer them. We will never catch up because the sleep time
> relative to guest execution time is very very small.
>
> Auto converge, as it is implemented today, does not address the problem
> I expect it solve. However, after rapid prototyping a new version of
> auto converge that performs adaptive modeling I've learned something.
> The workload I'm attempting to migrate is actually a pathological case.
> It is an excellent example of why throttling cpu is not always a good
> method of limiting memory access. In this test case we are able to touch
> over 600 MB of pages in 50 ms of continuous execution. In this case,
> even if I throttle the guest to 5% (50ms runtime, 950ms sleep) we still
> cannot even come close to catching up even with a fairly speedy network
> link (which not every user will have).
>
> Given the above, I believe that some workloads touch memory too fast and
> we'll never be able to live migrate them with auto-converge. On the
> lower end there are workloads that have a very small/stagnant working
> set size which will be live migratable without the need for
> auto-converge. Lastly, we have "the nebulous middle". These are
> workloads that would benefit from auto-converge because they touch pages
> too fast for migration to be able to deal with them, AND (important
> conditional here), throttling will(may?) actually reduce their rate of
> page modifications. I would like to try and define this "middle" set of
> workloads.
>
> A question with no obvious answer: How much throttling is acceptable? If
> I have to throttle a guest 90% and he ends up failing 75% of whatever
> transactions he is attempting to process then we have quite likely
> defeated the entire purpose of "live" migration. Perhaps it would be
> better in this case to just stop the guest and do a non-live migration.
> Maybe by reverting to non-live we actually save time and thus more
> transactions would have completed. This one may take some experimenting
> to be able to get a good idea for what makes the most sense. Maybe even
> have max throttling be be user configurable.
>
> With all this said, I still wonder exactly how big this "nebulous
> middle" really is. If, in practice, that "middle" only accounts for 1%
> of the workloads out there then is it really worth spending time fixing
> it? Keep in mind this is a two pronged test:
> 1. Guest cannot migrate because it changes memory too fast
> 2. Cpu throttling slows guest's memory writes down enough such that he
> can now migrate
>
> I'm interested in any thoughts anyone has. Thanks!
>

Ping, Just wondering if anyone has any thoughts on this issue?

-- 
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)

  reply	other threads:[~2015-03-11 19:47 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-02 21:04 [Qemu-devel] Migration auto-converge problem Jason J. Herne
2015-03-11 19:47 ` Jason J. Herne [this message]
2015-03-11 23:23 ` John Snow
2015-03-12 10:32 ` Dr. David Alan Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55009BC0.3010905@linux.vnet.ibm.com \
    --to=jjherne@linux.vnet.ibm.com \
    --cc=amit.shah@redhat.com \
    --cc=borntraeger@de.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).