From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:42614)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1a76It-00041W-JY
	for qemu-devel@nongnu.org; Thu, 10 Dec 2015 13:51:00 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1a76Ip-0004iW-JR
	for qemu-devel@nongnu.org; Thu, 10 Dec 2015 13:50:59 -0500
Received: from mx1.redhat.com ([209.132.183.28]:54586)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1a76Ip-0004iH-BZ
	for qemu-devel@nongnu.org; Thu, 10 Dec 2015 13:50:55 -0500
Date: Thu, 10 Dec 2015 18:50:48 +0000
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Message-ID: <20151210185048.GK2570@work-vm>
References: <1448357149-17572-1-git-send-email-zhang.zhanghailiang@huawei.com>
	<1448357149-17572-25-git-send-email-zhang.zhanghailiang@huawei.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1448357149-17572-25-git-send-email-zhang.zhanghailiang@huawei.com>
Subject: Re: [Qemu-devel] [PATCH COLO-Frame v11 24/39] COLO: Implement
 failover work for Secondary VM
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: zhanghailiang <zhang.zhanghailiang@huawei.com>
Cc: lizhijian@cn.fujitsu.com, quintela@redhat.com, yunhong.jiang@intel.com, eddie.dong@intel.com, peter.huangpeng@huawei.com, qemu-devel@nongnu.org, arei.gonglei@huawei.com, stefanha@redhat.com, amit.shah@redhat.com, hongyang.yang@easystack.cn

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> If users require SVM to takeover work, colo incoming thread should
> exit from loop while failover BH helps backing to migration incoming
> coroutine.
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> ---
>  migration/colo.c | 42 +++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 39 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c
> index 7a42fc6..f31e957 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -46,6 +46,33 @@ static bool colo_runstate_is_stopped(void)
>      return runstate_check(RUN_STATE_COLO) || !runstate_is_running();
>  }
>  
> +static void secondary_vm_do_failover(void)
> +{
> +    int old_state;
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +
> +    migrate_set_state(&mis->state, MIGRATION_STATUS_COLO,
> +                      MIGRATION_STATUS_COMPLETED);
> +
> +    if (!autostart) {
> +        error_report("\"-S\" qemu option will be ignored in secondary side");
> +        /* recover runstate to normal migration finish state */
> +        autostart = true;
> +    }

You might find libvirt will need something different for it to be
involved during the failover; but for now OK.

> +    old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
> +                                   FAILOVER_STATUS_COMPLETED);
> +    if (old_state != FAILOVER_STATUS_HANDLING) {
> +        error_report("Serious error while do failover for secondary VM,"
> +                     "old_state: %d", old_state);

Same suggestion as previous patch just to improve the error message.

> +        return;
> +    }
> +    /* For Secondary VM, jump to incoming co */
> +    if (mis->migration_incoming_co) {
> +        qemu_coroutine_enter(mis->migration_incoming_co, NULL);
> +    }
> +}
> +
>  static void primary_vm_do_failover(void)
>  {
>      MigrationState *s = migrate_get_current();
> @@ -74,6 +101,8 @@ void colo_do_failover(MigrationState *s)
>  
>      if (get_colo_mode() == COLO_MODE_PRIMARY) {
>          primary_vm_do_failover();
> +    } else {
> +        secondary_vm_do_failover();
>      }
>  }
>  
> @@ -404,6 +433,12 @@ void *colo_process_incoming_thread(void *opaque)
>                  continue;
>              }
>          }
> +
> +        if (failover_request_is_active()) {
> +            error_report("failover request");
> +            goto out;
> +        }
> +
>          /* FIXME: This is unnecessary for periodic checkpoint mode */
>          ret = colo_ctl_put(mis->to_src_file, COLO_COMMAND_CHECKPOINT_REPLY, 0);
>          if (ret < 0) {
> @@ -473,10 +508,11 @@ out:
>          qemu_fclose(fb);
>      }
>      qsb_free(buffer);
> -
> -    qemu_mutex_lock_iothread();
> +    /* Here, we can ensure BH is hold the global lock, and will join colo
> +    * incoming thread, so here it is not necessary to lock here again,
> +    * or there will be a deadlock error.
> +    */
>      colo_release_ram_cache();
> -    qemu_mutex_unlock_iothread();

OK, I think I understand that - becuase we know there is a failover request
active, then it must be holding the lock?

Other than the error message improvement:

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Dave

>  
>      if (mis->to_src_file) {
>          qemu_fclose(mis->to_src_file);
> -- 
> 1.8.3.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK