From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50145) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WNTbY-0006pX-Vp for qemu-devel@nongnu.org; Tue, 11 Mar 2014 16:48:57 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WNTbT-0004e5-QD for qemu-devel@nongnu.org; Tue, 11 Mar 2014 16:48:52 -0400 Received: from mx1.redhat.com ([209.132.183.28]:60548) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WNTbT-0004e1-IH for qemu-devel@nongnu.org; Tue, 11 Mar 2014 16:48:47 -0400 From: Juan Quintela In-Reply-To: <1394542415-5152-6-git-send-email-arei.gonglei@huawei.com> (arei gonglei's message of "Tue, 11 Mar 2014 20:53:30 +0800") References: <1394542415-5152-1-git-send-email-arei.gonglei@huawei.com> <1394542415-5152-6-git-send-email-arei.gonglei@huawei.com> Date: Tue, 11 Mar 2014 21:48:27 +0100 Message-ID: <87fvmo31ic.fsf@elfo.mitica> MIME-Version: 1.0 Content-Type: text/plain Subject: Re: [Qemu-devel] [PATCH 05/10] migration: Fix the migrate auto converge process Reply-To: quintela@redhat.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: arei.gonglei@huawei.com Cc: ChenLiang , weidong.huang@huawei.com, chegu_vinod@hp.com, qemu-devel@nongnu.org, owasserm@redhat.com, pbonzini@redhat.com wrote: > From: ChenLiang > > It is inaccuracy and complex that using the transfer speed of > migration thread to determine whether the convergence migration. > The dirty page may be compressed by XBZRLE or ZERO_PAGE.The counter > of updating dirty bitmap will be increasing continuously if the > migration can't convergence. "It is inexact and complex to use the migration transfer speed to dectermine weather the convergence of migration." > @@ -530,21 +523,11 @@ static void migration_bitmap_sync(void) > /* more than 1 second = 1000 millisecons */ > if (end_time > start_time + 1000) { > if (migrate_auto_converge()) { > - /* The following detection logic can be refined later. For now: > - Check to see if the dirtied bytes is 50% more than the approx. > - amount of bytes that just got transferred since the last time we > - were in this routine. If that happens >N times (for now N==4) > - we turn on the throttle down logic */ > - bytes_xfer_now = ram_bytes_transferred(); > - if (s->dirty_pages_rate && > - (num_dirty_pages_period * TARGET_PAGE_SIZE > > - (bytes_xfer_now - bytes_xfer_prev)/2) && > - (dirty_rate_high_cnt++ > 4)) { > - trace_migration_throttle(); > - mig_throttle_on = true; > - dirty_rate_high_cnt = 0; > - } > - bytes_xfer_prev = bytes_xfer_now; > + if (get_bitmap_sync_cnt() > 15) { > + /* It indicates that migration can't converge when the counter > + is larger than fifteen. Enable the feature of auto > converge */ Comment is not needed, it says excatly what the code does. But why 15? It is not that I think that the older code is better or worse than yours. Just that we move from one magic number to another (that is even bigger). Shouldn't it be easier jut just change mig_sleep_cpu() to do something like: static void mig_sleep_cpu(void *opq) { qemu_mutex_unlock_iothread(); g_usleep((2*get_bitmap_sync_cnt()*1000); qemu_mutex_lock_iothread(); } This would get the 30ms on the 15th iteration. I am open to change that formula to anything different, but what I want is changing this to something that makes the less convergence -> the more throotling. BTW, you are testing this with any workload to see that it improves? > + mig_throttle_on = true; > + } Vinod, what do you think? Do you have a workload to test this? Thanks, Juan.