From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:60889)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <zhang.zhanghailiang@huawei.com>) id 1a7J3i-0007Si-Fd
	for qemu-devel@nongnu.org; Fri, 11 Dec 2015 03:28:14 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <zhang.zhanghailiang@huawei.com>) id 1a7J3e-0003ge-8h
	for qemu-devel@nongnu.org; Fri, 11 Dec 2015 03:28:10 -0500
Received: from szxga03-in.huawei.com ([119.145.14.66]:53915)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <zhang.zhanghailiang@huawei.com>) id 1a7J3d-0003fo-EZ
	for qemu-devel@nongnu.org; Fri, 11 Dec 2015 03:28:06 -0500
References: <1448357149-17572-1-git-send-email-zhang.zhanghailiang@huawei.com>
	<1448357149-17572-25-git-send-email-zhang.zhanghailiang@huawei.com>
	<20151210185048.GK2570@work-vm>
From: Hailiang Zhang <zhang.zhanghailiang@huawei.com>
Message-ID: <566A88FB.60000@huawei.com>
Date: Fri, 11 Dec 2015 16:27:39 +0800
MIME-Version: 1.0
In-Reply-To: <20151210185048.GK2570@work-vm>
Content-Type: text/plain; charset="windows-1252"; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH COLO-Frame v11 24/39] COLO: Implement
 failover work for Secondary VM
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: lizhijian@cn.fujitsu.com, quintela@redhat.com, yunhong.jiang@intel.com, eddie.dong@intel.com, peter.huangpeng@huawei.com, qemu-devel@nongnu.org, arei.gonglei@huawei.com, stefanha@redhat.com, amit.shah@redhat.com, hongyang.yang@easystack.cn

On 2015/12/11 2:50, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> If users require SVM to takeover work, colo incoming thread should
>> exit from loop while failover BH helps backing to migration incoming
>> coroutine.
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> ---
>>   migration/colo.c | 42 +++++++++++++++++++++++++++++++++++++++---
>>   1 file changed, 39 insertions(+), 3 deletions(-)
>>
>> diff --git a/migration/colo.c b/migration/colo.c
>> index 7a42fc6..f31e957 100644
>> --- a/migration/colo.c
>> +++ b/migration/colo.c
>> @@ -46,6 +46,33 @@ static bool colo_runstate_is_stopped(void)
>>       return runstate_check(RUN_STATE_COLO) || !runstate_is_running();
>>   }
>>
>> +static void secondary_vm_do_failover(void)
>> +{
>> +    int old_state;
>> +    MigrationIncomingState *mis = migration_incoming_get_current();
>> +
>> +    migrate_set_state(&mis->state, MIGRATION_STATUS_COLO,
>> +                      MIGRATION_STATUS_COMPLETED);
>> +
>> +    if (!autostart) {
>> +        error_report("\"-S\" qemu option will be ignored in secondary side");
>> +        /* recover runstate to normal migration finish state */
>> +        autostart = true;
>> +    }
>
> You might find libvirt will need something different for it to be
> involved during the failover; but for now OK.
>

>> +    old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
>> +                                   FAILOVER_STATUS_COMPLETED);
>> +    if (old_state != FAILOVER_STATUS_HANDLING) {
>> +        error_report("Serious error while do failover for secondary VM,"
>> +                     "old_state: %d", old_state);
>
> Same suggestion as previous patch just to improve the error message.
>

OK, will fix it in next version.

>> +        return;
>> +    }
>> +    /* For Secondary VM, jump to incoming co */
>> +    if (mis->migration_incoming_co) {
>> +        qemu_coroutine_enter(mis->migration_incoming_co, NULL);
>> +    }
>> +}
>> +
>>   static void primary_vm_do_failover(void)
>>   {
>>       MigrationState *s = migrate_get_current();
>> @@ -74,6 +101,8 @@ void colo_do_failover(MigrationState *s)
>>
>>       if (get_colo_mode() == COLO_MODE_PRIMARY) {
>>           primary_vm_do_failover();
>> +    } else {
>> +        secondary_vm_do_failover();
>>       }
>>   }
>>
>> @@ -404,6 +433,12 @@ void *colo_process_incoming_thread(void *opaque)
>>                   continue;
>>               }
>>           }
>> +
>> +        if (failover_request_is_active()) {
>> +            error_report("failover request");
>> +            goto out;
>> +        }
>> +
>>           /* FIXME: This is unnecessary for periodic checkpoint mode */
>>           ret = colo_ctl_put(mis->to_src_file, COLO_COMMAND_CHECKPOINT_REPLY, 0);
>>           if (ret < 0) {
>> @@ -473,10 +508,11 @@ out:
>>           qemu_fclose(fb);
>>       }
>>       qsb_free(buffer);
>> -
>> -    qemu_mutex_lock_iothread();
>> +    /* Here, we can ensure BH is hold the global lock, and will join colo
>> +    * incoming thread, so here it is not necessary to lock here again,
>> +    * or there will be a deadlock error.
>> +    */
>>       colo_release_ram_cache();
>> -    qemu_mutex_unlock_iothread();
>
> OK, I think I understand that - becuase we know there is a failover request
> active, then it must be holding the lock?
>

Yes, we come here only when failover happened, and since Secondary VM
does failover in BH with holding iothread lock, and it will enter migration_incoming_co
at the end. The migration_incoming_co() will wait for colo incoming thread to finish.
So it can't try to get iothread lock, or there will be an deadlock error.

> Other than the error message improvement:
>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>
> Dave
>

Thanks,
Hailiang

>>
>>       if (mis->to_src_file) {
>>           qemu_fclose(mis->to_src_file);
>> --
>> 1.8.3.1
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>