From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1M6pb2-0005bj-5a for qemu-devel@nongnu.org; Wed, 20 May 2009 13:28:52 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1M6pax-0005ZN-2o for qemu-devel@nongnu.org; Wed, 20 May 2009 13:28:51 -0400 Received: from [199.232.76.173] (port=44571 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1M6paw-0005ZB-SX for qemu-devel@nongnu.org; Wed, 20 May 2009 13:28:46 -0400 Received: from mail-bw0-f175.google.com ([209.85.218.175]:42861) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1M6paw-0003gd-9a for qemu-devel@nongnu.org; Wed, 20 May 2009 13:28:46 -0400 Received: by bwz23 with SMTP id 23so515022bwz.34 for ; Wed, 20 May 2009 10:28:45 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <4A143659.6050108@redhat.com> References: <1242731347-1558-1-git-send-email-uril@redhat.com> <4A12AD80.701@codemonkey.ws> <20090519144101.GA16372@shell.devel.redhat.com> <4A12C942.4090708@redhat.com> <20090519150927.GB16372@shell.devel.redhat.com> <4A12F7D5.7050608@codemonkey.ws> <4A143659.6050108@redhat.com> Date: Wed, 20 May 2009 20:28:45 +0300 Message-ID: Subject: Re: [Qemu-devel] [PATCH] ram_save_live: add a no-progress convergence rule From: Blue Swirl Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Uri Lublin Cc: Glauber Costa , Dor Laor , qemu-devel@nongnu.org On 5/20/09, Uri Lublin wrote: > On 05/19/2009 09:17 PM, Anthony Liguori wrote: > > > Glauber Costa wrote: > > > > > On Tue, May 19, 2009 at 05:59:14PM +0300, Dor Laor wrote: > > > > > > > > > > We can also make it configurable using the monitor migrate command. > > > > For example: > > > > migrate -d -no_progress -threshold=x tcp:.... > > > > > > > it can be done, but it fits better as a different monitor command > > > > > > anthony, do you have any strong opinions here, or is this scheme > > > acceptable? > > > > > > > Threshold is a bad metric. There's no way to choose a right number. If > > we were going to have a means to support metrics-based forced > > convergence (and I really think this belongs in libvirt) I'd rather see > > something based on bandwidth or wall clock time. > > > > Let me put it this way, why 50? What were the guidelines for choosing > > that number and how would you explain what number a user should choose? > > > > I've changed the threshold of the first convergence rule, to 50 from 10. > Why 10 ? For this rule the threshold (number of dirty pages) and the number > of bytes to transfer are equivalent. > > 50 pages is about 200K, which can be still sent quickly. > I've added debug messages and noticed we never hit a number smaller than 10 > (excluding 0). The truth is there were very little number of runs with less > than 50 dirty pages too. I don't mind leaving it at 10 (should be > configurable too). > > For the second migration convergence rule I've set the limit to 10, as it > seems much larger than what I've needed (all the runs I've made a number of > 2-4 no-progress iterations was good enough, as it seems to have a repetitive > behavior later), but I've enlarged it "just in case". No real research work > was done here. > > Note that a no-progress iteration depends on both network bandwidth and > guest actions. Instead of freezing the guest or aborting the migration, the guest could be throttled a bit by giving it less CPU time relative to migration, or by incurring a small delay for each page dirtying write access. Maybe this method would find the balance faster.