[Qemu-devel] Migration auto-converge problem

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] Migration auto-converge problem
@ 2015-03-02 21:04 Jason J. Herne
  2015-03-11 19:47 ` Jason J. Herne
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Jason J. Herne @ 2015-03-02 21:04 UTC (permalink / raw)
  To: qemu-devel@nongnu.org qemu-devel, Christian Borntraeger

We have a test case that dirties memory very very quickly. When we run 
this test case in a guest and attempt a migration, that migration never 
converges even when done with auto-converge on.

The auto converge behavior of Qemu functions differently purpose than I 
had expected. In my mind, I expected auto converge to continuously apply 
adaptive throttling of the cpu utilization of a busy guest if Qemu 
detects that progress is not being made quickly enough in the guest 
memory transfer. The idea is that a guest dirtying pages too quickly 
will be adaptively slowed down by the throttling until migration is able 
to transfer pages fast enough to complete the migration within the max 
downtime. Qemu's current auto converge does not appear to do this in 
practice.

A quick look at the source code shows the following:
- Autoconverge keeps a counter. This counter is only incremented if, for 
a completed memory pass, the guest is dirtying pages at a rate of 50% 
(or more) of our transfer rate.
- The counter only increments at most once per pass through memory.
- The counter must reach 4 before any throttling is done. (a minimum of 
4 memory passes have to occur)
- Once the counter reaches 4, it is immediately reset to 0, and then 
throttling action is taken.
- Throttling occurs by doing an async sleep on each guest cpu for 30ms, 
exactly one time.

Now consider the scenario auto-converge is meant to solve (I think): A 
guest touching lots of memory very quickly. Each pass through memory is 
going to be sending a lot of pages, and thus, taking a decent amount of 
time to complete. If, for every four passes, we are *only* sleeping the 
guest for 30ms, our guest is still going to be able dirty pages faster 
than we can transfer them. We will never catch up because the sleep time 
relative to guest execution time is very very small.

Auto converge, as it is implemented today, does not address the problem 
I expect it solve. However, after rapid prototyping a new version of 
auto converge that performs adaptive modeling I've learned something. 
The workload I'm attempting to migrate is actually a pathological case. 
It is an excellent example of why throttling cpu is not always a good 
method of limiting memory access. In this test case we are able to touch 
over 600 MB of pages in 50 ms of continuous execution. In this case, 
even if I throttle the guest to 5% (50ms runtime, 950ms sleep) we still 
cannot even come close to catching up even with a fairly speedy network 
link (which not every user will have).

Given the above, I believe that some workloads touch memory too fast and 
we'll never be able to live migrate them with auto-converge. On the 
lower end there are workloads that have a very small/stagnant working 
set size which will be live migratable without the need for 
auto-converge. Lastly, we have "the nebulous middle". These are 
workloads that would benefit from auto-converge because they touch pages 
too fast for migration to be able to deal with them, AND (important 
conditional here), throttling will(may?) actually reduce their rate of 
page modifications. I would like to try and define this "middle" set of 
workloads.

A question with no obvious answer: How much throttling is acceptable? If 
I have to throttle a guest 90% and he ends up failing 75% of whatever 
transactions he is attempting to process then we have quite likely 
defeated the entire purpose of "live" migration. Perhaps it would be 
better in this case to just stop the guest and do a non-live migration. 
Maybe by reverting to non-live we actually save time and thus more 
transactions would have completed. This one may take some experimenting 
to be able to get a good idea for what makes the most sense. Maybe even 
have max throttling be be user configurable.

With all this said, I still wonder exactly how big this "nebulous 
middle" really is. If, in practice, that "middle" only accounts for 1% 
of the workloads out there then is it really worth spending time fixing 
it? Keep in mind this is a two pronged test:
1. Guest cannot migrate because it changes memory too fast
2. Cpu throttling slows guest's memory writes down enough such that he 
can now migrate

I'm interested in any thoughts anyone has. Thanks!

-- 
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] Migration auto-converge problem
  2015-03-02 21:04 [Qemu-devel] Migration auto-converge problem Jason J. Herne
@ 2015-03-11 19:47 ` Jason J. Herne
  2015-03-11 23:23 ` John Snow
  2015-03-12 10:32 ` Dr. David Alan Gilbert
  2 siblings, 0 replies; 4+ messages in thread
From: Jason J. Herne @ 2015-03-11 19:47 UTC (permalink / raw)
  To: qemu-devel@nongnu.org qemu-devel, Christian Borntraeger, quintela,
	amit.shah

On 03/02/2015 04:04 PM, Jason J. Herne wrote:
> We have a test case that dirties memory very very quickly. When we run
> this test case in a guest and attempt a migration, that migration never
> converges even when done with auto-converge on.
>
> The auto converge behavior of Qemu functions differently purpose than I
> had expected. In my mind, I expected auto converge to continuously apply
> adaptive throttling of the cpu utilization of a busy guest if Qemu
> detects that progress is not being made quickly enough in the guest
> memory transfer. The idea is that a guest dirtying pages too quickly
> will be adaptively slowed down by the throttling until migration is able
> to transfer pages fast enough to complete the migration within the max
> downtime. Qemu's current auto converge does not appear to do this in
> practice.
>
> A quick look at the source code shows the following:
> - Autoconverge keeps a counter. This counter is only incremented if, for
> a completed memory pass, the guest is dirtying pages at a rate of 50%
> (or more) of our transfer rate.
> - The counter only increments at most once per pass through memory.
> - The counter must reach 4 before any throttling is done. (a minimum of
> 4 memory passes have to occur)
> - Once the counter reaches 4, it is immediately reset to 0, and then
> throttling action is taken.
> - Throttling occurs by doing an async sleep on each guest cpu for 30ms,
> exactly one time.
>
> Now consider the scenario auto-converge is meant to solve (I think): A
> guest touching lots of memory very quickly. Each pass through memory is
> going to be sending a lot of pages, and thus, taking a decent amount of
> time to complete. If, for every four passes, we are *only* sleeping the
> guest for 30ms, our guest is still going to be able dirty pages faster
> than we can transfer them. We will never catch up because the sleep time
> relative to guest execution time is very very small.
>
> Auto converge, as it is implemented today, does not address the problem
> I expect it solve. However, after rapid prototyping a new version of
> auto converge that performs adaptive modeling I've learned something.
> The workload I'm attempting to migrate is actually a pathological case.
> It is an excellent example of why throttling cpu is not always a good
> method of limiting memory access. In this test case we are able to touch
> over 600 MB of pages in 50 ms of continuous execution. In this case,
> even if I throttle the guest to 5% (50ms runtime, 950ms sleep) we still
> cannot even come close to catching up even with a fairly speedy network
> link (which not every user will have).
>
> Given the above, I believe that some workloads touch memory too fast and
> we'll never be able to live migrate them with auto-converge. On the
> lower end there are workloads that have a very small/stagnant working
> set size which will be live migratable without the need for
> auto-converge. Lastly, we have "the nebulous middle". These are
> workloads that would benefit from auto-converge because they touch pages
> too fast for migration to be able to deal with them, AND (important
> conditional here), throttling will(may?) actually reduce their rate of
> page modifications. I would like to try and define this "middle" set of
> workloads.
>
> A question with no obvious answer: How much throttling is acceptable? If
> I have to throttle a guest 90% and he ends up failing 75% of whatever
> transactions he is attempting to process then we have quite likely
> defeated the entire purpose of "live" migration. Perhaps it would be
> better in this case to just stop the guest and do a non-live migration.
> Maybe by reverting to non-live we actually save time and thus more
> transactions would have completed. This one may take some experimenting
> to be able to get a good idea for what makes the most sense. Maybe even
> have max throttling be be user configurable.
>
> With all this said, I still wonder exactly how big this "nebulous
> middle" really is. If, in practice, that "middle" only accounts for 1%
> of the workloads out there then is it really worth spending time fixing
> it? Keep in mind this is a two pronged test:
> 1. Guest cannot migrate because it changes memory too fast
> 2. Cpu throttling slows guest's memory writes down enough such that he
> can now migrate
>
> I'm interested in any thoughts anyone has. Thanks!
>

Ping, Just wondering if anyone has any thoughts on this issue?

-- 
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] Migration auto-converge problem
  2015-03-02 21:04 [Qemu-devel] Migration auto-converge problem Jason J. Herne
  2015-03-11 19:47 ` Jason J. Herne
@ 2015-03-11 23:23 ` John Snow
  2015-03-12 10:32 ` Dr. David Alan Gilbert
  2 siblings, 0 replies; 4+ messages in thread
From: John Snow @ 2015-03-11 23:23 UTC (permalink / raw)
  To: jjherne, qemu-devel@nongnu.org qemu-devel, Christian Borntraeger
  Cc: gilbert >> Dr. David Alan Gilbert,
	quin >> juan quin >> Juan Jose Quintela Carreira



On 03/02/2015 04:04 PM, Jason J. Herne wrote:
> We have a test case that dirties memory very very quickly. When we run
> this test case in a guest and attempt a migration, that migration never
> converges even when done with auto-converge on.
>
> The auto converge behavior of Qemu functions differently purpose than I
> had expected. In my mind, I expected auto converge to continuously apply
> adaptive throttling of the cpu utilization of a busy guest if Qemu
> detects that progress is not being made quickly enough in the guest
> memory transfer. The idea is that a guest dirtying pages too quickly
> will be adaptively slowed down by the throttling until migration is able
> to transfer pages fast enough to complete the migration within the max
> downtime. Qemu's current auto converge does not appear to do this in
> practice.
>
> A quick look at the source code shows the following:
> - Autoconverge keeps a counter. This counter is only incremented if, for
> a completed memory pass, the guest is dirtying pages at a rate of 50%
> (or more) of our transfer rate.
> - The counter only increments at most once per pass through memory.
> - The counter must reach 4 before any throttling is done. (a minimum of
> 4 memory passes have to occur)
> - Once the counter reaches 4, it is immediately reset to 0, and then
> throttling action is taken.
> - Throttling occurs by doing an async sleep on each guest cpu for 30ms,
> exactly one time.
>
> Now consider the scenario auto-converge is meant to solve (I think): A
> guest touching lots of memory very quickly. Each pass through memory is
> going to be sending a lot of pages, and thus, taking a decent amount of
> time to complete. If, for every four passes, we are *only* sleeping the
> guest for 30ms, our guest is still going to be able dirty pages faster
> than we can transfer them. We will never catch up because the sleep time
> relative to guest execution time is very very small.
>
> Auto converge, as it is implemented today, does not address the problem
> I expect it solve. However, after rapid prototyping a new version of
> auto converge that performs adaptive modeling I've learned something.
> The workload I'm attempting to migrate is actually a pathological case.
> It is an excellent example of why throttling cpu is not always a good
> method of limiting memory access. In this test case we are able to touch
> over 600 MB of pages in 50 ms of continuous execution. In this case,
> even if I throttle the guest to 5% (50ms runtime, 950ms sleep) we still
> cannot even come close to catching up even with a fairly speedy network
> link (which not every user will have).
>
> Given the above, I believe that some workloads touch memory too fast and
> we'll never be able to live migrate them with auto-converge. On the
> lower end there are workloads that have a very small/stagnant working
> set size which will be live migratable without the need for
> auto-converge. Lastly, we have "the nebulous middle". These are
> workloads that would benefit from auto-converge because they touch pages
> too fast for migration to be able to deal with them, AND (important
> conditional here), throttling will(may?) actually reduce their rate of
> page modifications. I would like to try and define this "middle" set of
> workloads.
>
> A question with no obvious answer: How much throttling is acceptable? If
> I have to throttle a guest 90% and he ends up failing 75% of whatever
> transactions he is attempting to process then we have quite likely
> defeated the entire purpose of "live" migration. Perhaps it would be
> better in this case to just stop the guest and do a non-live migration.
> Maybe by reverting to non-live we actually save time and thus more
> transactions would have completed. This one may take some experimenting
> to be able to get a good idea for what makes the most sense. Maybe even
> have max throttling be be user configurable.
>
> With all this said, I still wonder exactly how big this "nebulous
> middle" really is. If, in practice, that "middle" only accounts for 1%
> of the workloads out there then is it really worth spending time fixing
> it? Keep in mind this is a two pronged test:
> 1. Guest cannot migrate because it changes memory too fast
> 2. Cpu throttling slows guest's memory writes down enough such that he
> can now migrate
>
> I'm interested in any thoughts anyone has. Thanks!
>

This is just a passing thought since I have not invested deeply in the 
live migration convergence mechanisms myself, but:

Is it possible to apply a progressively more brutish throttle to a guest 
if we detect we are not making (or indeed /losing/) progress?

We could start with no throttle and see how far we get, then 
progressively apply a tighter grip on the VM until we make satisfactory 
progress, then continue on until we hit our "Just pause it and ship the 
rest" threshold.

That way we allow ourselves the ability to throttle very naughty guests 
very aggressively (To the point of effectively even paused) without 
disturbing the niceness of our largely idle guests. In this way, even 
very high throttle caps should be acceptable.

This will allow live migration to "fail gracefully" for cases that are 
modifying memory or disk just too absurdly fast back to essentially a 
paused migration.

I'll leave it to the migration wizards to explain why I am foolhardy.
--js

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] Migration auto-converge problem
  2015-03-02 21:04 [Qemu-devel] Migration auto-converge problem Jason J. Herne
  2015-03-11 19:47 ` Jason J. Herne
  2015-03-11 23:23 ` John Snow
@ 2015-03-12 10:32 ` Dr. David Alan Gilbert
  2 siblings, 0 replies; 4+ messages in thread
From: Dr. David Alan Gilbert @ 2015-03-12 10:32 UTC (permalink / raw)
  To: Jason J. Herne; +Cc: Christian Borntraeger, qemu-devel@nongnu.org qemu-devel

* Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote:
> We have a test case that dirties memory very very quickly. When we run this
> test case in a guest and attempt a migration, that migration never converges
> even when done with auto-converge on.
> 
> The auto converge behavior of Qemu functions differently purpose than I had
> expected. In my mind, I expected auto converge to continuously apply
> adaptive throttling of the cpu utilization of a busy guest if Qemu detects
> that progress is not being made quickly enough in the guest memory transfer.

Yes; it's not as 'auto' as you might hope for an 'auto converge'.

> The idea is that a guest dirtying pages too quickly will be adaptively
> slowed down by the throttling until migration is able to transfer pages fast
> enough to complete the migration within the max downtime. Qemu's current
> auto converge does not appear to do this in practice.
> 
> A quick look at the source code shows the following:
> - Autoconverge keeps a counter. This counter is only incremented if, for a
> completed memory pass, the guest is dirtying pages at a rate of 50% (or
> more) of our transfer rate.
> - The counter only increments at most once per pass through memory.
> - The counter must reach 4 before any throttling is done. (a minimum of 4
> memory passes have to occur)
> - Once the counter reaches 4, it is immediately reset to 0, and then
> throttling action is taken.
> - Throttling occurs by doing an async sleep on each guest cpu for 30ms,
> exactly one time.
> 
> Now consider the scenario auto-converge is meant to solve (I think): A guest
> touching lots of memory very quickly. Each pass through memory is going to
> be sending a lot of pages, and thus, taking a decent amount of time to
> complete. If, for every four passes, we are *only* sleeping the guest for
> 30ms, our guest is still going to be able dirty pages faster than we can
> transfer them. We will never catch up because the sleep time relative to
> guest execution time is very very small.

And the problem is that magic numbers vary in usefulness wildly
depending on CPU speed, memory bandwidth, link bandwidth/type, load type and
the phase of the moon.

> Auto converge, as it is implemented today, does not address the problem I
> expect it solve. However, after rapid prototyping a new version of auto
> converge that performs adaptive modeling I've learned something. The
> workload I'm attempting to migrate is actually a pathological case. It is an
> excellent example of why throttling cpu is not always a good method of
> limiting memory access. In this test case we are able to touch over 600 MB
> of pages in 50 ms of continuous execution. In this case, even if I throttle
> the guest to 5% (50ms runtime, 950ms sleep) we still cannot even come close
> to catching up even with a fairly speedy network link (which not every user
> will have).

Right; the worst case is very bad; touching one line in each page can
be done with very little CPU; however you do hope that most real world
apps aren't quite that bad, and if it is that bad then turning XBZRLE on
should help.

> Given the above, I believe that some workloads touch memory too fast and
> we'll never be able to live migrate them with auto-converge. On the lower
> end there are workloads that have a very small/stagnant working set size
> which will be live migratable without the need for auto-converge. Lastly, we
> have "the nebulous middle". These are workloads that would benefit from
> auto-converge because they touch pages too fast for migration to be able to
> deal with them, AND (important conditional here), throttling will(may?)
> actually reduce their rate of page modifications. I would like to try and
> define this "middle" set of workloads.
> 
> A question with no obvious answer: How much throttling is acceptable? If I
> have to throttle a guest 90% and he ends up failing 75% of whatever
> transactions he is attempting to process then we have quite likely defeated
> the entire purpose of "live" migration. Perhaps it would be better in this
> case to just stop the guest and do a non-live migration. Maybe by reverting
> to non-live we actually save time and thus more transactions would have
> completed. This one may take some experimenting to be able to get a good
> idea for what makes the most sense. Maybe even have max throttling be be
> user configurable.

But then you could just increase the 'max-downtime' setting that
produces a similar behaviour.  Of course the reality is that users want no
appreciable throttling and instant migration - something has to give.
Postcopy is one answer but it still gives a performance degradation.

> With all this said, I still wonder exactly how big this "nebulous middle"
> really is. If, in practice, that "middle" only accounts for 1% of the
> workloads out there then is it really worth spending time fixing it? Keep in
> mind this is a two pronged test:
> 1. Guest cannot migrate because it changes memory too fast
> 2. Cpu throttling slows guest's memory writes down enough such that he can
> now migrate
> 
> I'm interested in any thoughts anyone has. Thanks!

It doesn't really matter what percentage of the workloads it is - as long
as real world workloads hit (rather than just evil worst cases); 1% still means
that a lot of people are going to hit it; and you can never usefully approximate
just what that percentage is.

I think it's certainly worth improving auto-converge; but as you say it's
very difficult to know just how many people it helps.

Dave

> -- 
> -- Jason J. Herne (jjherne@linux.vnet.ibm.com)
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-03-12 10:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-03-02 21:04 [Qemu-devel] Migration auto-converge problem Jason J. Herne
2015-03-11 19:47 ` Jason J. Herne
2015-03-11 23:23 ` John Snow
2015-03-12 10:32 ` Dr. David Alan Gilbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).