From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1M6Tqn-0005aF-6X for qemu-devel@nongnu.org; Tue, 19 May 2009 14:15:41 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1M6Tqi-0005NM-8t for qemu-devel@nongnu.org; Tue, 19 May 2009 14:15:40 -0400 Received: from [199.232.76.173] (port=44706 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1M6Tqi-0005Mz-4N for qemu-devel@nongnu.org; Tue, 19 May 2009 14:15:36 -0400 Received: from qw-out-1920.google.com ([74.125.92.147]:24653) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1M6Tqh-0001AU-Op for qemu-devel@nongnu.org; Tue, 19 May 2009 14:15:35 -0400 Received: by qw-out-1920.google.com with SMTP id 4so2599218qwk.4 for ; Tue, 19 May 2009 11:15:35 -0700 (PDT) Message-ID: <4A12F744.3080809@codemonkey.ws> Date: Tue, 19 May 2009 13:15:32 -0500 From: Anthony Liguori MIME-Version: 1.0 Subject: Re: [Qemu-devel] [PATCH] ram_save_live: add a no-progress convergence rule References: <1242731347-1558-1-git-send-email-uril@redhat.com> <4A12AD80.701@codemonkey.ws> <20090519144101.GA16372@shell.devel.redhat.com> <4A12C942.4090708@redhat.com> In-Reply-To: <4A12C942.4090708@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: dlaor@redhat.com Cc: Glauber Costa , Uri Lublin , qemu-devel@nongnu.org Dor Laor wrote: > The problem is that if migration is not progressing since the guest is > dirtying pages > faster than the migration protocol can send, than we just waist time > and cpu. > The minimum is to notify the monitor interface in order to let mgmt > daemon to trap it. > We can easily see this issue while running iperf in the guest or any > other high load/dirty > pages scenario. The problem is, what's the metric for determining the guest isn't progressing? A raw iteration count is not a valid metric. It may be expected that the migration take 50 iterations. The management tool knows the guest isn't progressing when it decides that a guest isn't progressing :-) > We can also make it configurable using the monitor migrate command. > For example: > migrate -d -no_progress -threshold=x tcp:.... Theshold is really a bad metric to use. You have no idea how much data has been passed in each iteration. If you only needed one more iteration, then stopping the migration short was a really bad idea. The only thing that this does is give a false sense of security. Management tools have to deal with forcing migration convergence based on policies. If a management tool isn't doing this today, it's broken IMHO. Basically, threshold introduces a regression. If you run iperf and migrate a guest with a very large memory size, after migration, you'll get soft lockups because the guest hasn't been running for 10 seconds. This is bad. Regards, Anthony Liguori