From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1M6pNr-0008Ku-Qj
	for qemu-devel@nongnu.org; Wed, 20 May 2009 13:15:15 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1M6pNn-0008Jg-G8
	for qemu-devel@nongnu.org; Wed, 20 May 2009 13:15:15 -0400
Received: from [199.232.76.173] (port=48253 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1M6pNn-0008Jd-DQ
	for qemu-devel@nongnu.org; Wed, 20 May 2009 13:15:11 -0400
Received: from mx1.redhat.com ([66.187.233.31]:38598)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <berrange@redhat.com>) id 1M6pNm-0001kY-Vv
	for qemu-devel@nongnu.org; Wed, 20 May 2009 13:15:11 -0400
Date: Wed, 20 May 2009 18:15:03 +0100
From: "Daniel P. Berrange" <berrange@redhat.com>
Subject: Re: [Qemu-devel] [PATCH] ram_save_live: add a no-progress convergence
	rule
Message-ID: <20090520171503.GV29798@redhat.com>
References: <1242731347-1558-1-git-send-email-uril@redhat.com>
	<4A12AD80.701@codemonkey.ws>
	<20090519144101.GA16372@shell.devel.redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20090519144101.GA16372@shell.devel.redhat.com>
Reply-To: "Daniel P. Berrange" <berrange@redhat.com>
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Glauber Costa <glommer@redhat.com>
Cc: Uri Lublin <uril@redhat.com>, qemu-devel@nongnu.org

On Tue, May 19, 2009 at 10:41:01AM -0400, Glauber Costa wrote:
> On Tue, May 19, 2009 at 08:00:48AM -0500, Anthony Liguori wrote:
> > Uri Lublin wrote:
> > 
> > The right place to do this is in a management tool.  An arbitrary 
> > convergence rule of 50 can do more damage than good.
> > 
> > For some set of users, it's better that live migration fail than it 
> > cause an arbitrarily long pause in the guest which can result in dropped 
> > TCP connections, soft lock ups, and other badness.
> > 
> > A management tool can force convergence by issuing a "stop" command in 
> > the monitor.  I suspect a management tool cares more about wall-clock 
> > time than number of iterations too so a valid metric would be something 
> > along the lines of if not converged after N seconds, issue stop monitor 
> > command where N is calculated from available network bandwidth and guest 
> > memory size.
> Another possibility is for the management tool to increase the bandwidth for
> little periods if it perceives that no progress is being made.
> 
> Anyhow, I completely agree that we should not introduce this in qemu.
> 
> However, maybe we could augment our "info migrate" to provide more info about
> the internal state of migration, so the mgmt tool can take a more informed
> decision?

Yes, I think a 'info migration' command is neccessary regardless of
whether conversion is progressing successfully or not. If a migration
takes 60 seconds total in normal case, it is useful for mgmt tool
can get some indication of how far it has progressed. eg, report
pages sent, pages remaining, and total pages. NB, sent + remaining != total

The mgmt tool knows how much wall clock time has elapsed, and can
either present progress info the admin. So after x seconds has elapsed
the admin can see if it has nearly completed, or is stuck, and make a
decision, or let the mgmt app apply policies of its own, cancelling the
migration, or pausing the guest to let it complete non-live. 

If you want to put a policy into QEMU too go ahead, as long as its 
optional so a mgmt app can have full control if desired.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|