From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:50145)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <quintela@redhat.com>) id 1WNTbY-0006pX-Vp
	for qemu-devel@nongnu.org; Tue, 11 Mar 2014 16:48:57 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <quintela@redhat.com>) id 1WNTbT-0004e5-QD
	for qemu-devel@nongnu.org; Tue, 11 Mar 2014 16:48:52 -0400
Received: from mx1.redhat.com ([209.132.183.28]:60548)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <quintela@redhat.com>) id 1WNTbT-0004e1-IH
	for qemu-devel@nongnu.org; Tue, 11 Mar 2014 16:48:47 -0400
From: Juan Quintela <quintela@redhat.com>
In-Reply-To: <1394542415-5152-6-git-send-email-arei.gonglei@huawei.com> (arei
	gonglei's message of "Tue, 11 Mar 2014 20:53:30 +0800")
References: <1394542415-5152-1-git-send-email-arei.gonglei@huawei.com>
	<1394542415-5152-6-git-send-email-arei.gonglei@huawei.com>
Date: Tue, 11 Mar 2014 21:48:27 +0100
Message-ID: <87fvmo31ic.fsf@elfo.mitica>
MIME-Version: 1.0
Content-Type: text/plain
Subject: Re: [Qemu-devel] [PATCH 05/10] migration: Fix the migrate auto
	converge process
Reply-To: quintela@redhat.com
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: arei.gonglei@huawei.com
Cc: ChenLiang <chenliang88@huawei.com>, weidong.huang@huawei.com, chegu_vinod@hp.com, qemu-devel@nongnu.org, owasserm@redhat.com, pbonzini@redhat.com

<arei.gonglei@huawei.com> wrote:
> From: ChenLiang <chenliang88@huawei.com>
>
> It is inaccuracy and complex that using the transfer speed of
> migration thread to determine whether the convergence migration.
> The dirty page may be compressed by XBZRLE or ZERO_PAGE.The counter
> of updating dirty bitmap will be increasing continuously if the
> migration can't convergence.

"It is inexact and complex to use the migration transfer speed to
dectermine weather the convergence of migration."

> @@ -530,21 +523,11 @@ static void migration_bitmap_sync(void)
>      /* more than 1 second = 1000 millisecons */
>      if (end_time > start_time + 1000) {
>          if (migrate_auto_converge()) {
> -            /* The following detection logic can be refined later. For now:
> -               Check to see if the dirtied bytes is 50% more than the approx.
> -               amount of bytes that just got transferred since the last time we
> -               were in this routine. If that happens >N times (for now N==4)
> -               we turn on the throttle down logic */
> -            bytes_xfer_now = ram_bytes_transferred();
> -            if (s->dirty_pages_rate &&
> -               (num_dirty_pages_period * TARGET_PAGE_SIZE >
> -                   (bytes_xfer_now - bytes_xfer_prev)/2) &&
> -               (dirty_rate_high_cnt++ > 4)) {
> -                    trace_migration_throttle();
> -                    mig_throttle_on = true;
> -                    dirty_rate_high_cnt = 0;
> -             }
> -             bytes_xfer_prev = bytes_xfer_now;
> +            if (get_bitmap_sync_cnt() > 15) {
> +                /* It indicates that migration can't converge when the counter
> +                is larger than fifteen. Enable the feature of auto
>      converge */

Comment is not needed, it says excatly what the code does.

But why 15?  It is not that I think that the older code is better or
worse than yours.  Just that we move from one magic number to another
(that is even bigger).

Shouldn't it be easier jut just change mig_sleep_cpu()

to do something like:


static void mig_sleep_cpu(void *opq)
{
    qemu_mutex_unlock_iothread();
    g_usleep((2*get_bitmap_sync_cnt()*1000);
    qemu_mutex_lock_iothread();
}

This would get the 30ms on the 15th iteration.  I am open to change that
formula to anything different, but what I want is changing this to
something that makes the less convergence -> the more throotling.

BTW, you are testing this with any workload to see that it improves?


> +                mig_throttle_on = true;
> +            }

Vinod, what do you think?

Do you have a workload to test this?

Thanks, Juan.