From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1M6pYF-0004Xu-VR for qemu-devel@nongnu.org; Wed, 20 May 2009 13:25:59 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1M6pYB-0004VZ-Ak for qemu-devel@nongnu.org; Wed, 20 May 2009 13:25:59 -0400 Received: from [199.232.76.173] (port=44959 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1M6pYB-0004VJ-0W for qemu-devel@nongnu.org; Wed, 20 May 2009 13:25:55 -0400 Received: from mx2.redhat.com ([66.187.237.31]:34226) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1M6pYA-0003Mv-JP for qemu-devel@nongnu.org; Wed, 20 May 2009 13:25:54 -0400 Message-ID: <4A143D1D.9060800@redhat.com> Date: Wed, 20 May 2009 20:25:49 +0300 From: Uri Lublin MIME-Version: 1.0 Subject: Re: [Qemu-devel] [PATCH] ram_save_live: add a no-progress convergence rule References: <1242731347-1558-1-git-send-email-uril@redhat.com> <4A12AD80.701@codemonkey.ws> <20090519144101.GA16372@shell.devel.redhat.com> <4A12F5F6.2010003@codemonkey.ws> In-Reply-To: <4A12F5F6.2010003@codemonkey.ws> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: Glauber Costa , Dor Laor , qemu-devel@nongnu.org On 05/19/2009 09:09 PM, Anthony Liguori wrote: > Glauber Costa wrote: >> Another possibility is for the management tool to increase the >> bandwidth for >> little periods if it perceives that no progress is being made. >> >> Anyhow, I completely agree that we should not introduce this in qemu. >> >> However, maybe we could augment our "info migrate" to provide more >> info about >> the internal state of migration, so the mgmt tool can take a more >> informed >> decision? > Yes, I've also suggested this before. I'm willing to expose just about > any metric that makes sense. We need to be careful about not exposing > implementation details, but things like iteration count, last working > set size, average working set size, etc. should all be relatively stable > metrics even if the implementation changes. > I agree we need to provide more information via "info migration". That's not enough though. In addition of augmenting "info migration" we need to add more monitor commands to set/change migration parameters (e.g. current bandwidth limit), and change the migration code to act according to such parameters. These commands should affect the migration when used before and during migration. Note that as management tool, most likely, call "info migration" periodically, it may miss information about some (current/last) statistics. Also the first iteration may cause some averages to get biased. How would you recognize "stuck" migrations ? By comparing Average Working Set Size and Average Iteration Transfer Size ? Counting the number of no-progress iterations ? The average "regression" of no-progress iterations ? The no-progress convergence rule was fairly easy to implement and gave a pretty good heuristic to recognizing the migration is stuck. Regards, Uri.