From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:60514)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1WJKUj-0002bb-03
	for qemu-devel@nongnu.org; Fri, 28 Feb 2014 05:16:45 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1WJKUe-0002j2-2I
	for qemu-devel@nongnu.org; Fri, 28 Feb 2014 05:16:40 -0500
Received: from mx1.redhat.com ([209.132.183.28]:49294)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1WJKUd-0002iP-Q7
	for qemu-devel@nongnu.org; Fri, 28 Feb 2014 05:16:35 -0500
Date: Fri, 28 Feb 2014 10:16:17 +0000
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Message-ID: <20140228101616.GF2695@work-vm>
References: <33183CC9F5247A488A2544077AF19020815D2295@SZXEMA503-MBS.china.huawei.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <33183CC9F5247A488A2544077AF19020815D2295@SZXEMA503-MBS.china.huawei.com>
Subject: Re: [Qemu-devel] [PATCH 5/7] migration: Fix the migrate auto
 converge process
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Gonglei (Arei)" <arei.gonglei@huawei.com>
Cc: "chenliang (T)" <chenliang88@huawei.com>, Peter Maydell <peter.maydell@linaro.org>, Juan Quintela <quintela@redhat.com>, "pl@kamp.de" <pl@kamp.de>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, "aliguori@amazon.com" <aliguori@amazon.com>, "pbonzini@redhat.com" <pbonzini@redhat.com>

* Gonglei (Arei) (arei.gonglei@huawei.com) wrote:
> It is inaccuracy and complex that using the transfer speed of
> migration thread to determine whether the convergence migration.
> The dirty page may be compressed by XBZRLE or ZERO_PAGE.The counter
> of updating dirty bitmap will be increasing continuously if the
> migration can't convergence.
> 
> Signed-off-by: ChenLiang <chenliang88@huawei.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> ---
>  arch_init.c | 26 +++-----------------------
>  1 file changed, 3 insertions(+), 23 deletions(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index fc71331..2211e0b 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -107,7 +107,6 @@ int graphic_depth = 32;
>  
>  const uint32_t arch_type = QEMU_ARCH;
>  static bool mig_throttle_on;
> -static int dirty_rate_high_cnt;
>  static void check_guest_throttling(void);
>  
>  static uint64_t bitmap_sync_cnt;
> @@ -464,17 +463,11 @@ static void migration_bitmap_sync(void)
>      uint64_t num_dirty_pages_init = migration_dirty_pages;
>      MigrationState *s = migrate_get_current();
>      static int64_t start_time;
> -    static int64_t bytes_xfer_prev;
>      static int64_t num_dirty_pages_period;
>      int64_t end_time;
> -    int64_t bytes_xfer_now;
>  
>      increase_bitmap_sync_cnt();
>  
> -    if (!bytes_xfer_prev) {
> -        bytes_xfer_prev = ram_bytes_transferred();
> -    }
> -
>      if (!start_time) {
>          start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>      }
> @@ -493,21 +486,9 @@ static void migration_bitmap_sync(void)
>      /* more than 1 second = 1000 millisecons */
>      if (end_time > start_time + 1000) {
>          if (migrate_auto_converge()) {
> -            /* The following detection logic can be refined later. For now:
> -               Check to see if the dirtied bytes is 50% more than the approx.
> -               amount of bytes that just got transferred since the last time we
> -               were in this routine. If that happens >N times (for now N==4)
> -               we turn on the throttle down logic */
> -            bytes_xfer_now = ram_bytes_transferred();
> -            if (s->dirty_pages_rate &&
> -               (num_dirty_pages_period * TARGET_PAGE_SIZE >
> -                   (bytes_xfer_now - bytes_xfer_prev)/2) &&
> -               (dirty_rate_high_cnt++ > 4)) {
> -                    trace_migration_throttle();
> -                    mig_throttle_on = true;
> -                    dirty_rate_high_cnt = 0;
> -             }
> -             bytes_xfer_prev = bytes_xfer_now;
> +            if (get_bitmap_sync_cnt() > 15) {
> +                mig_throttle_on = true;
> +            }

That is a lot simpler, and I suspect as good - again I'd
move that magic '15' to a constant somewhere.

What have you tested this on - have you tested with really big RAM VMs?
What's it's behaviour like with rate-limiting?

Dave
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK