From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60889) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a7J3i-0007Si-Fd for qemu-devel@nongnu.org; Fri, 11 Dec 2015 03:28:14 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a7J3e-0003ge-8h for qemu-devel@nongnu.org; Fri, 11 Dec 2015 03:28:10 -0500 Received: from szxga03-in.huawei.com ([119.145.14.66]:53915) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a7J3d-0003fo-EZ for qemu-devel@nongnu.org; Fri, 11 Dec 2015 03:28:06 -0500 References: <1448357149-17572-1-git-send-email-zhang.zhanghailiang@huawei.com> <1448357149-17572-25-git-send-email-zhang.zhanghailiang@huawei.com> <20151210185048.GK2570@work-vm> From: Hailiang Zhang Message-ID: <566A88FB.60000@huawei.com> Date: Fri, 11 Dec 2015 16:27:39 +0800 MIME-Version: 1.0 In-Reply-To: <20151210185048.GK2570@work-vm> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH COLO-Frame v11 24/39] COLO: Implement failover work for Secondary VM List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" Cc: lizhijian@cn.fujitsu.com, quintela@redhat.com, yunhong.jiang@intel.com, eddie.dong@intel.com, peter.huangpeng@huawei.com, qemu-devel@nongnu.org, arei.gonglei@huawei.com, stefanha@redhat.com, amit.shah@redhat.com, hongyang.yang@easystack.cn On 2015/12/11 2:50, Dr. David Alan Gilbert wrote: > * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote: >> If users require SVM to takeover work, colo incoming thread should >> exit from loop while failover BH helps backing to migration incoming >> coroutine. >> >> Signed-off-by: zhanghailiang >> Signed-off-by: Li Zhijian >> --- >> migration/colo.c | 42 +++++++++++++++++++++++++++++++++++++++--- >> 1 file changed, 39 insertions(+), 3 deletions(-) >> >> diff --git a/migration/colo.c b/migration/colo.c >> index 7a42fc6..f31e957 100644 >> --- a/migration/colo.c >> +++ b/migration/colo.c >> @@ -46,6 +46,33 @@ static bool colo_runstate_is_stopped(void) >> return runstate_check(RUN_STATE_COLO) || !runstate_is_running(); >> } >> >> +static void secondary_vm_do_failover(void) >> +{ >> + int old_state; >> + MigrationIncomingState *mis = migration_incoming_get_current(); >> + >> + migrate_set_state(&mis->state, MIGRATION_STATUS_COLO, >> + MIGRATION_STATUS_COMPLETED); >> + >> + if (!autostart) { >> + error_report("\"-S\" qemu option will be ignored in secondary side"); >> + /* recover runstate to normal migration finish state */ >> + autostart = true; >> + } > > You might find libvirt will need something different for it to be > involved during the failover; but for now OK. > >> + old_state = failover_set_state(FAILOVER_STATUS_HANDLING, >> + FAILOVER_STATUS_COMPLETED); >> + if (old_state != FAILOVER_STATUS_HANDLING) { >> + error_report("Serious error while do failover for secondary VM," >> + "old_state: %d", old_state); > > Same suggestion as previous patch just to improve the error message. > OK, will fix it in next version. >> + return; >> + } >> + /* For Secondary VM, jump to incoming co */ >> + if (mis->migration_incoming_co) { >> + qemu_coroutine_enter(mis->migration_incoming_co, NULL); >> + } >> +} >> + >> static void primary_vm_do_failover(void) >> { >> MigrationState *s = migrate_get_current(); >> @@ -74,6 +101,8 @@ void colo_do_failover(MigrationState *s) >> >> if (get_colo_mode() == COLO_MODE_PRIMARY) { >> primary_vm_do_failover(); >> + } else { >> + secondary_vm_do_failover(); >> } >> } >> >> @@ -404,6 +433,12 @@ void *colo_process_incoming_thread(void *opaque) >> continue; >> } >> } >> + >> + if (failover_request_is_active()) { >> + error_report("failover request"); >> + goto out; >> + } >> + >> /* FIXME: This is unnecessary for periodic checkpoint mode */ >> ret = colo_ctl_put(mis->to_src_file, COLO_COMMAND_CHECKPOINT_REPLY, 0); >> if (ret < 0) { >> @@ -473,10 +508,11 @@ out: >> qemu_fclose(fb); >> } >> qsb_free(buffer); >> - >> - qemu_mutex_lock_iothread(); >> + /* Here, we can ensure BH is hold the global lock, and will join colo >> + * incoming thread, so here it is not necessary to lock here again, >> + * or there will be a deadlock error. >> + */ >> colo_release_ram_cache(); >> - qemu_mutex_unlock_iothread(); > > OK, I think I understand that - becuase we know there is a failover request > active, then it must be holding the lock? > Yes, we come here only when failover happened, and since Secondary VM does failover in BH with holding iothread lock, and it will enter migration_incoming_co at the end. The migration_incoming_co() will wait for colo incoming thread to finish. So it can't try to get iothread lock, or there will be an deadlock error. > Other than the error message improvement: > > Reviewed-by: Dr. David Alan Gilbert > > Dave > Thanks, Hailiang >> >> if (mis->to_src_file) { >> qemu_fclose(mis->to_src_file); >> -- >> 1.8.3.1 >> >> > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > . >