From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1M6OwE-0006K0-LC for qemu-devel@nongnu.org; Tue, 19 May 2009 09:00:58 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1M6Ow9-0006IN-Ds for qemu-devel@nongnu.org; Tue, 19 May 2009 09:00:58 -0400 Received: from [199.232.76.173] (port=53193 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1M6Ow9-0006IA-5k for qemu-devel@nongnu.org; Tue, 19 May 2009 09:00:53 -0400 Received: from mx20.gnu.org ([199.232.41.8]:7948) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1M6Ow8-0004px-LI for qemu-devel@nongnu.org; Tue, 19 May 2009 09:00:52 -0400 Received: from mail-gx0-f176.google.com ([209.85.217.176]) by mx20.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1M6Ow8-0004OA-3w for qemu-devel@nongnu.org; Tue, 19 May 2009 09:00:52 -0400 Received: by gxk24 with SMTP id 24so12038970gxk.10 for ; Tue, 19 May 2009 06:00:51 -0700 (PDT) Message-ID: <4A12AD80.701@codemonkey.ws> Date: Tue, 19 May 2009 08:00:48 -0500 From: Anthony Liguori MIME-Version: 1.0 Subject: Re: [Qemu-devel] [PATCH] ram_save_live: add a no-progress convergence rule References: <1242731347-1558-1-git-send-email-uril@redhat.com> In-Reply-To: <1242731347-1558-1-git-send-email-uril@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Uri Lublin Cc: qemu-devel@nongnu.org Uri Lublin wrote: > Currently the live-part (section QEMU_VM_SECTION_PART) of > ram_save_live has only one convergence rule, which is > when the number of dirty pages is smaller than a threshold. > > When the guest uses more memory pages than the threshold (e.g. > playing a movie, copying files, sending/receiving many packets), > it may take a very long time before convergence according to > this rule. > > This patch (re)introduces a no-progress convergence rule, which limit > the number of times the migration process is not progressing > (and even regressing), with regards to the number of dirty > pages. No-progress means that the number of pages that got > dirty is larger than the number of pages that got transferred > to the destination during the last transfer. > This rule applies only after the first round (in which most > memory pages are being transferred). > > Also this patch enlarges the number-dirty-pages threshold (of > the first convergence rule) to 50 pages (was 10) > > Signed-off-by: Uri Lublin > The right place to do this is in a management tool. An arbitrary convergence rule of 50 can do more damage than good. For some set of users, it's better that live migration fail than it cause an arbitrarily long pause in the guest which can result in dropped TCP connections, soft lock ups, and other badness. A management tool can force convergence by issuing a "stop" command in the monitor. I suspect a management tool cares more about wall-clock time than number of iterations too so a valid metric would be something along the lines of if not converged after N seconds, issue stop monitor command where N is calculated from available network bandwidth and guest memory size. Regards, Anthony Liguori