From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1M6p6L-0001vR-Ux for qemu-devel@nongnu.org; Wed, 20 May 2009 12:57:09 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1M6p6G-0001sU-93 for qemu-devel@nongnu.org; Wed, 20 May 2009 12:57:08 -0400 Received: from [199.232.76.173] (port=49080 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1M6p6G-0001sJ-1q for qemu-devel@nongnu.org; Wed, 20 May 2009 12:57:04 -0400 Received: from mx2.redhat.com ([66.187.237.31]:36660) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1M6p6F-0006vz-Do for qemu-devel@nongnu.org; Wed, 20 May 2009 12:57:03 -0400 Message-ID: <4A143659.6050108@redhat.com> Date: Wed, 20 May 2009 19:56:57 +0300 From: Uri Lublin MIME-Version: 1.0 Subject: Re: [Qemu-devel] [PATCH] ram_save_live: add a no-progress convergence rule References: <1242731347-1558-1-git-send-email-uril@redhat.com> <4A12AD80.701@codemonkey.ws> <20090519144101.GA16372@shell.devel.redhat.com> <4A12C942.4090708@redhat.com> <20090519150927.GB16372@shell.devel.redhat.com> <4A12F7D5.7050608@codemonkey.ws> In-Reply-To: <4A12F7D5.7050608@codemonkey.ws> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: Glauber Costa , Dor Laor , qemu-devel@nongnu.org On 05/19/2009 09:17 PM, Anthony Liguori wrote: > Glauber Costa wrote: >> On Tue, May 19, 2009 at 05:59:14PM +0300, Dor Laor wrote: >> >>> We can also make it configurable using the monitor migrate command. >>> For example: >>> migrate -d -no_progress -threshold=x tcp:.... >> it can be done, but it fits better as a different monitor command >> >> anthony, do you have any strong opinions here, or is this scheme >> acceptable? > > Threshold is a bad metric. There's no way to choose a right number. If > we were going to have a means to support metrics-based forced > convergence (and I really think this belongs in libvirt) I'd rather see > something based on bandwidth or wall clock time. > > Let me put it this way, why 50? What were the guidelines for choosing > that number and how would you explain what number a user should choose? I've changed the threshold of the first convergence rule, to 50 from 10. Why 10 ? For this rule the threshold (number of dirty pages) and the number of bytes to transfer are equivalent. 50 pages is about 200K, which can be still sent quickly. I've added debug messages and noticed we never hit a number smaller than 10 (excluding 0). The truth is there were very little number of runs with less than 50 dirty pages too. I don't mind leaving it at 10 (should be configurable too). For the second migration convergence rule I've set the limit to 10, as it seems much larger than what I've needed (all the runs I've made a number of 2-4 no-progress iterations was good enough, as it seems to have a repetitive behavior later), but I've enlarged it "just in case". No real research work was done here. Note that a no-progress iteration depends on both network bandwidth and guest actions. Regards, Uri.