From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38587) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YW0Pu-0000Nh-4J for qemu-devel@nongnu.org; Thu, 12 Mar 2015 06:32:39 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YW0Pq-00083E-2h for qemu-devel@nongnu.org; Thu, 12 Mar 2015 06:32:38 -0400 Received: from mx1.redhat.com ([209.132.183.28]:39245) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YW0Pp-000835-SI for qemu-devel@nongnu.org; Thu, 12 Mar 2015 06:32:34 -0400 Date: Thu, 12 Mar 2015 10:32:23 +0000 From: "Dr. David Alan Gilbert" Message-ID: <20150312103218.GD2330@work-vm> References: <54F4D076.3040402@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <54F4D076.3040402@linux.vnet.ibm.com> Subject: Re: [Qemu-devel] Migration auto-converge problem List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Jason J. Herne" Cc: Christian Borntraeger , "qemu-devel@nongnu.org qemu-devel" * Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote: > We have a test case that dirties memory very very quickly. When we run this > test case in a guest and attempt a migration, that migration never converges > even when done with auto-converge on. > > The auto converge behavior of Qemu functions differently purpose than I had > expected. In my mind, I expected auto converge to continuously apply > adaptive throttling of the cpu utilization of a busy guest if Qemu detects > that progress is not being made quickly enough in the guest memory transfer. Yes; it's not as 'auto' as you might hope for an 'auto converge'. > The idea is that a guest dirtying pages too quickly will be adaptively > slowed down by the throttling until migration is able to transfer pages fast > enough to complete the migration within the max downtime. Qemu's current > auto converge does not appear to do this in practice. > > A quick look at the source code shows the following: > - Autoconverge keeps a counter. This counter is only incremented if, for a > completed memory pass, the guest is dirtying pages at a rate of 50% (or > more) of our transfer rate. > - The counter only increments at most once per pass through memory. > - The counter must reach 4 before any throttling is done. (a minimum of 4 > memory passes have to occur) > - Once the counter reaches 4, it is immediately reset to 0, and then > throttling action is taken. > - Throttling occurs by doing an async sleep on each guest cpu for 30ms, > exactly one time. > > Now consider the scenario auto-converge is meant to solve (I think): A guest > touching lots of memory very quickly. Each pass through memory is going to > be sending a lot of pages, and thus, taking a decent amount of time to > complete. If, for every four passes, we are *only* sleeping the guest for > 30ms, our guest is still going to be able dirty pages faster than we can > transfer them. We will never catch up because the sleep time relative to > guest execution time is very very small. And the problem is that magic numbers vary in usefulness wildly depending on CPU speed, memory bandwidth, link bandwidth/type, load type and the phase of the moon. > Auto converge, as it is implemented today, does not address the problem I > expect it solve. However, after rapid prototyping a new version of auto > converge that performs adaptive modeling I've learned something. The > workload I'm attempting to migrate is actually a pathological case. It is an > excellent example of why throttling cpu is not always a good method of > limiting memory access. In this test case we are able to touch over 600 MB > of pages in 50 ms of continuous execution. In this case, even if I throttle > the guest to 5% (50ms runtime, 950ms sleep) we still cannot even come close > to catching up even with a fairly speedy network link (which not every user > will have). Right; the worst case is very bad; touching one line in each page can be done with very little CPU; however you do hope that most real world apps aren't quite that bad, and if it is that bad then turning XBZRLE on should help. > Given the above, I believe that some workloads touch memory too fast and > we'll never be able to live migrate them with auto-converge. On the lower > end there are workloads that have a very small/stagnant working set size > which will be live migratable without the need for auto-converge. Lastly, we > have "the nebulous middle". These are workloads that would benefit from > auto-converge because they touch pages too fast for migration to be able to > deal with them, AND (important conditional here), throttling will(may?) > actually reduce their rate of page modifications. I would like to try and > define this "middle" set of workloads. > > A question with no obvious answer: How much throttling is acceptable? If I > have to throttle a guest 90% and he ends up failing 75% of whatever > transactions he is attempting to process then we have quite likely defeated > the entire purpose of "live" migration. Perhaps it would be better in this > case to just stop the guest and do a non-live migration. Maybe by reverting > to non-live we actually save time and thus more transactions would have > completed. This one may take some experimenting to be able to get a good > idea for what makes the most sense. Maybe even have max throttling be be > user configurable. But then you could just increase the 'max-downtime' setting that produces a similar behaviour. Of course the reality is that users want no appreciable throttling and instant migration - something has to give. Postcopy is one answer but it still gives a performance degradation. > With all this said, I still wonder exactly how big this "nebulous middle" > really is. If, in practice, that "middle" only accounts for 1% of the > workloads out there then is it really worth spending time fixing it? Keep in > mind this is a two pronged test: > 1. Guest cannot migrate because it changes memory too fast > 2. Cpu throttling slows guest's memory writes down enough such that he can > now migrate > > I'm interested in any thoughts anyone has. Thanks! It doesn't really matter what percentage of the workloads it is - as long as real world workloads hit (rather than just evil worst cases); 1% still means that a lot of people are going to hit it; and you can never usefully approximate just what that percentage is. I think it's certainly worth improving auto-converge; but as you say it's very difficult to know just how many people it helps. Dave > -- > -- Jason J. Herne (jjherne@linux.vnet.ibm.com) > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK