From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:53340) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1R3rVu-00065W-8F for qemu-devel@nongnu.org; Wed, 14 Sep 2011 11:36:39 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1R3rVn-0007zi-M6 for qemu-devel@nongnu.org; Wed, 14 Sep 2011 11:36:32 -0400 Received: from e5.ny.us.ibm.com ([32.97.182.145]:41988) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1R3rVn-0007zJ-E5 for qemu-devel@nongnu.org; Wed, 14 Sep 2011 11:36:31 -0400 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e5.ny.us.ibm.com (8.14.4/8.13.1) with ESMTP id p8EF5rwp008515 for ; Wed, 14 Sep 2011 11:05:53 -0400 Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p8EFaQTf158896 for ; Wed, 14 Sep 2011 11:36:28 -0400 Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1]) by d03av02.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p8EFZwfD023171 for ; Wed, 14 Sep 2011 09:35:58 -0600 Message-ID: <4E70C9F7.2090900@linux.vnet.ibm.com> Date: Wed, 14 Sep 2011 10:36:23 -0500 From: Michael Roth MIME-Version: 1.0 References: <20110914131819.GA29426@puenktchen.ani.univie.ac.at> <4E70C6B9.5070203@linux.vnet.ibm.com> In-Reply-To: <4E70C6B9.5070203@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH] A small patch to introduce stop conditions to the live migration. List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Thomas Treutner Cc: qemu-devel@nongnu.org On 09/14/2011 10:22 AM, Michael Roth wrote: > On 09/14/2011 08:18 AM, Thomas Treutner wrote: >> Currently, it is possible that a live migration never finishes, when >> the dirty page rate is high compared to the scan/transfer rate. The >> exact values for MAX_MEMORY_ITERATIONS and >> MAX_TOTAL_MEMORY_TRANSFER_FACTOR are arguable, but there should be >> *some* limit to force the final iteration of a live migration that >> does not converge. >> >> --- >> arch_init.c | 10 +++++++++- >> 1 files changed, 9 insertions(+), 1 deletions(-) >> >> diff --git a/arch_init.c b/arch_init.c >> index 4486925..57fcb1e 100644 >> --- a/arch_init.c >> +++ b/arch_init.c >> @@ -89,6 +89,9 @@ const uint32_t arch_type = QEMU_ARCH; >> #define RAM_SAVE_FLAG_EOS 0x10 >> #define RAM_SAVE_FLAG_CONTINUE 0x20 >> >> +#define MAX_MEMORY_ITERATIONS 10 >> +#define MAX_TOTAL_MEMORY_TRANSFER_FACTOR 3 >> + >> static int is_dup_page(uint8_t *page, uint8_t ch) >> { >> uint32_t val = ch<< 24 | ch<< 16 | ch<< 8 | ch; >> @@ -107,6 +110,8 @@ static int is_dup_page(uint8_t *page, uint8_t ch) >> static RAMBlock *last_block; >> static ram_addr_t last_offset; >> >> +static int numberFullMemoryIterations = 0; >> + >> static int ram_save_block(QEMUFile *f) >> { >> RAMBlock *block = last_block; >> @@ -158,7 +163,10 @@ static int ram_save_block(QEMUFile *f) >> offset = 0; >> block = QLIST_NEXT(block, next); >> if (!block) >> + { >> + numberFullMemoryIterations++; >> block = QLIST_FIRST(&ram_list.blocks); >> + } >> } >> >> current_addr = block->offset + offset; >> @@ -295,7 +303,7 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int >> stage, void *opaque) >> >> expected_time = ram_save_remaining() * TARGET_PAGE_SIZE / bwidth; >> >> - return (stage == 2)&& (expected_time<= migrate_max_downtime()); >> + return (stage == 2)&& ((expected_time<= migrate_max_downtime() || >> (numberFullMemoryIterations == MAX_MEMORY_ITERATIONS) || >> (bytes_transferred> >> (MAX_TOTAL_MEMORY_TRANSFER_FACTOR*ram_bytes_total())))); >> } >> >> static inline void *host_from_stream_offset(QEMUFile *f, > > To me it seems like a simpler solution is to do something like: > > return (stage == 2) && current_time() + expected_time < migrate_deadline() > > where migrate_deadline() is the time that the migration began plus > migrate_max_downtime(). > > Currently, it looks like migrate_max_downtime() is being applied on a > per-iteration basis rather than per-migration, which seems like a bug to > me. Block migration seems to suffer from this as well... Sorry, ignore this, that calculation's just for stage 3.