From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1M6pNr-0008Ku-Qj for qemu-devel@nongnu.org; Wed, 20 May 2009 13:15:15 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1M6pNn-0008Jg-G8 for qemu-devel@nongnu.org; Wed, 20 May 2009 13:15:15 -0400 Received: from [199.232.76.173] (port=48253 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1M6pNn-0008Jd-DQ for qemu-devel@nongnu.org; Wed, 20 May 2009 13:15:11 -0400 Received: from mx1.redhat.com ([66.187.233.31]:38598) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1M6pNm-0001kY-Vv for qemu-devel@nongnu.org; Wed, 20 May 2009 13:15:11 -0400 Date: Wed, 20 May 2009 18:15:03 +0100 From: "Daniel P. Berrange" Subject: Re: [Qemu-devel] [PATCH] ram_save_live: add a no-progress convergence rule Message-ID: <20090520171503.GV29798@redhat.com> References: <1242731347-1558-1-git-send-email-uril@redhat.com> <4A12AD80.701@codemonkey.ws> <20090519144101.GA16372@shell.devel.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090519144101.GA16372@shell.devel.redhat.com> Reply-To: "Daniel P. Berrange" List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Glauber Costa Cc: Uri Lublin , qemu-devel@nongnu.org On Tue, May 19, 2009 at 10:41:01AM -0400, Glauber Costa wrote: > On Tue, May 19, 2009 at 08:00:48AM -0500, Anthony Liguori wrote: > > Uri Lublin wrote: > > > > The right place to do this is in a management tool. An arbitrary > > convergence rule of 50 can do more damage than good. > > > > For some set of users, it's better that live migration fail than it > > cause an arbitrarily long pause in the guest which can result in dropped > > TCP connections, soft lock ups, and other badness. > > > > A management tool can force convergence by issuing a "stop" command in > > the monitor. I suspect a management tool cares more about wall-clock > > time than number of iterations too so a valid metric would be something > > along the lines of if not converged after N seconds, issue stop monitor > > command where N is calculated from available network bandwidth and guest > > memory size. > Another possibility is for the management tool to increase the bandwidth for > little periods if it perceives that no progress is being made. > > Anyhow, I completely agree that we should not introduce this in qemu. > > However, maybe we could augment our "info migrate" to provide more info about > the internal state of migration, so the mgmt tool can take a more informed > decision? Yes, I think a 'info migration' command is neccessary regardless of whether conversion is progressing successfully or not. If a migration takes 60 seconds total in normal case, it is useful for mgmt tool can get some indication of how far it has progressed. eg, report pages sent, pages remaining, and total pages. NB, sent + remaining != total The mgmt tool knows how much wall clock time has elapsed, and can either present progress info the admin. So after x seconds has elapsed the admin can see if it has nearly completed, or is stuck, and make a decision, or let the mgmt app apply policies of its own, cancelling the migration, or pausing the guest to let it complete non-live. If you want to put a policy into QEMU too go ahead, as long as its optional so a mgmt app can have full control if desired. Regards, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|