From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:53957) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1R3rIj-000583-8A for qemu-devel@nongnu.org; Wed, 14 Sep 2011 11:23:07 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1R3rIi-0004f9-06 for qemu-devel@nongnu.org; Wed, 14 Sep 2011 11:23:01 -0400 Received: from e6.ny.us.ibm.com ([32.97.182.146]:46611) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1R3rIh-0004en-R3 for qemu-devel@nongnu.org; Wed, 14 Sep 2011 11:22:59 -0400 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e6.ny.us.ibm.com (8.14.4/8.13.1) with ESMTP id p8EEwlbI002257 for ; Wed, 14 Sep 2011 10:58:47 -0400 Received: from d03av06.boulder.ibm.com (d03av06.boulder.ibm.com [9.17.195.245]) by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p8EFMtjR266934 for ; Wed, 14 Sep 2011 11:22:56 -0400 Received: from d03av06.boulder.ibm.com (loopback [127.0.0.1]) by d03av06.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p8EFSRTk020147 for ; Wed, 14 Sep 2011 09:28:27 -0600 Message-ID: <4E70C6B9.5070203@linux.vnet.ibm.com> Date: Wed, 14 Sep 2011 10:22:33 -0500 From: Michael Roth MIME-Version: 1.0 References: <20110914131819.GA29426@puenktchen.ani.univie.ac.at> In-Reply-To: <20110914131819.GA29426@puenktchen.ani.univie.ac.at> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH] A small patch to introduce stop conditions to the live migration. List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Thomas Treutner Cc: qemu-devel@nongnu.org On 09/14/2011 08:18 AM, Thomas Treutner wrote: > Currently, it is possible that a live migration never finishes, when the dirty page rate is high compared to the scan/transfer rate. The exact values for MAX_MEMORY_ITERATIONS and MAX_TOTAL_MEMORY_TRANSFER_FACTOR are arguable, but there should be *some* limit to force the final iteration of a live migration that does not converge. > > --- > arch_init.c | 10 +++++++++- > 1 files changed, 9 insertions(+), 1 deletions(-) > > diff --git a/arch_init.c b/arch_init.c > index 4486925..57fcb1e 100644 > --- a/arch_init.c > +++ b/arch_init.c > @@ -89,6 +89,9 @@ const uint32_t arch_type = QEMU_ARCH; > #define RAM_SAVE_FLAG_EOS 0x10 > #define RAM_SAVE_FLAG_CONTINUE 0x20 > > +#define MAX_MEMORY_ITERATIONS 10 > +#define MAX_TOTAL_MEMORY_TRANSFER_FACTOR 3 > + > static int is_dup_page(uint8_t *page, uint8_t ch) > { > uint32_t val = ch<< 24 | ch<< 16 | ch<< 8 | ch; > @@ -107,6 +110,8 @@ static int is_dup_page(uint8_t *page, uint8_t ch) > static RAMBlock *last_block; > static ram_addr_t last_offset; > > +static int numberFullMemoryIterations = 0; > + > static int ram_save_block(QEMUFile *f) > { > RAMBlock *block = last_block; > @@ -158,7 +163,10 @@ static int ram_save_block(QEMUFile *f) > offset = 0; > block = QLIST_NEXT(block, next); > if (!block) > + { > + numberFullMemoryIterations++; > block = QLIST_FIRST(&ram_list.blocks); > + } > } > > current_addr = block->offset + offset; > @@ -295,7 +303,7 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque) > > expected_time = ram_save_remaining() * TARGET_PAGE_SIZE / bwidth; > > - return (stage == 2)&& (expected_time<= migrate_max_downtime()); > + return (stage == 2)&& ((expected_time<= migrate_max_downtime() || (numberFullMemoryIterations == MAX_MEMORY_ITERATIONS) || (bytes_transferred> (MAX_TOTAL_MEMORY_TRANSFER_FACTOR*ram_bytes_total())))); > } > > static inline void *host_from_stream_offset(QEMUFile *f, To me it seems like a simpler solution is to do something like: return (stage == 2) && current_time() + expected_time < migrate_deadline() where migrate_deadline() is the time that the migration began plus migrate_max_downtime(). Currently, it looks like migrate_max_downtime() is being applied on a per-iteration basis rather than per-migration, which seems like a bug to me. Block migration seems to suffer from this as well...