From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1M6pYF-0004Xu-VR
	for qemu-devel@nongnu.org; Wed, 20 May 2009 13:25:59 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1M6pYB-0004VZ-Ak
	for qemu-devel@nongnu.org; Wed, 20 May 2009 13:25:59 -0400
Received: from [199.232.76.173] (port=44959 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1M6pYB-0004VJ-0W
	for qemu-devel@nongnu.org; Wed, 20 May 2009 13:25:55 -0400
Received: from mx2.redhat.com ([66.187.237.31]:34226)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <uril@redhat.com>) id 1M6pYA-0003Mv-JP
	for qemu-devel@nongnu.org; Wed, 20 May 2009 13:25:54 -0400
Message-ID: <4A143D1D.9060800@redhat.com>
Date: Wed, 20 May 2009 20:25:49 +0300
From: Uri Lublin <uril@redhat.com>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] [PATCH] ram_save_live: add a no-progress convergence
	rule
References: <1242731347-1558-1-git-send-email-uril@redhat.com>	<4A12AD80.701@codemonkey.ws>	<20090519144101.GA16372@shell.devel.redhat.com>
	<4A12F5F6.2010003@codemonkey.ws>
In-Reply-To: <4A12F5F6.2010003@codemonkey.ws>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: Glauber Costa <glommer@redhat.com>, Dor Laor <dlaor@redhat.com>, qemu-devel@nongnu.org

On 05/19/2009 09:09 PM, Anthony Liguori wrote:
> Glauber Costa wrote:
>> Another possibility is for the management tool to increase the
>> bandwidth for
>> little periods if it perceives that no progress is being made.
>>
>> Anyhow, I completely agree that we should not introduce this in qemu.
>>
>> However, maybe we could augment our "info migrate" to provide more
>> info about
>> the internal state of migration, so the mgmt tool can take a more
>> informed
>> decision?
> Yes, I've also suggested this before. I'm willing to expose just about
> any metric that makes sense. We need to be careful about not exposing
> implementation details, but things like iteration count, last working
> set size, average working set size, etc. should all be relatively stable
> metrics even if the implementation changes.
>

I agree we need to provide more information via "info migration". That's not 
enough though.

In addition of augmenting "info migration" we need to add more monitor commands 
to set/change migration parameters (e.g. current bandwidth limit), and change 
the migration code to act according to such parameters. These commands should 
affect the migration when used before and during migration.

Note that as management tool, most likely, call "info migration" periodically, 
it may miss information about some (current/last) statistics.
Also the first iteration may cause some averages to get biased.

How would you recognize "stuck" migrations ? By comparing Average Working Set 
Size and Average Iteration Transfer Size ? Counting the number of no-progress 
iterations ? The average "regression" of no-progress iterations ?

The no-progress convergence rule was fairly easy to implement and gave a pretty 
good heuristic to recognizing the migration is stuck.

Regards,
     Uri.