From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39839) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z299T-00077p-FI for qemu-devel@nongnu.org; Mon, 08 Jun 2015 22:20:32 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z299P-0003vw-2A for qemu-devel@nongnu.org; Mon, 08 Jun 2015 22:20:31 -0400 Received: from szxga01-in.huawei.com ([58.251.152.64]:59881) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z299N-0003lj-PT for qemu-devel@nongnu.org; Mon, 08 Jun 2015 22:20:26 -0400 Message-ID: <55764D4A.7010606@huawei.com> Date: Tue, 9 Jun 2015 10:19:54 +0800 From: zhanghailiang MIME-Version: 1.0 References: <1432196001-10352-1-git-send-email-zhang.zhanghailiang@huawei.com> <1432196001-10352-12-git-send-email-zhang.zhanghailiang@huawei.com> <20150605180217.GI2139@work-vm> In-Reply-To: <20150605180217.GI2139@work-vm> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH COLO-Frame v5 11/29] COLO VMstate: Load VM state into qsb before restore it List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" Cc: lizhijian@cn.fujitsu.com, quintela@redhat.com, yunhong.jiang@intel.com, eddie.dong@intel.com, peter.huangpeng@huawei.com, qemu-devel@nongnu.org, arei.gonglei@huawei.com, amit.shah@redhat.com, Yang Hongyang , david@gibson.dropbear.id.au On 2015/6/6 2:02, Dr. David Alan Gilbert wrote: > * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote: >> We should cache the device state before restore it, >> besides, we should call qemu_system_reset() before load VM state, >> which can ensure the data is intact. > > I think the description could be better; to me the important > point is not that it's a 'cache', but the important point is that you > don't destroy the state of the secondary until you are sure that you can > read the whole state from the primary, just in case the primary fails > in the middle of sending the state. > OK, I will fix this description. > However, other than that: > > Reviewed-by: Dr. David Alan Gilbert > > (I suspect you'll need updates as the qemu migration code updates) > Yes, thanks. > Dave > >> Note: If we discard qemu_system_reset(), there will be some odd error, >> For exmple, qemu in slave side crashes and reports: >> >> KVM: entry failed, hardware error 0x7 >> EAX=00000000 EBX=0000e000 ECX=00009578 EDX=0000434f >> ESI=0000fc10 EDI=0000434f EBP=00000000 ESP=00001fca >> EIP=00009594 EFL=00010246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 >> ES =0040 00000400 0000ffff 00009300 >> CS =f000 000f0000 0000ffff 00009b00 >> SS =434f 000434f0 0000ffff 00009300 >> DS =434f 000434f0 0000ffff 00009300 >> FS =0000 00000000 0000ffff 00009300 >> GS =0000 00000000 0000ffff 00009300 >> LDT=0000 00000000 0000ffff 00008200 >> TR =0000 00000000 0000ffff 00008b00 >> GDT= 0002dcc8 00000047 >> IDT= 00000000 0000ffff >> CR0=00000010 CR2=ffffffff CR3=00000000 CR4=00000000 >> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 >> DR6=00000000ffff0ff0 DR7=0000000000000400 >> EFER=0000000000000000 >> Code=c0 74 0f 66 b9 78 95 00 00 66 31 d2 66 31 c0 e9 47 e0 fb 90 90 fa fc 66 c3 66 53 66 89 c3 66 e8 9d e8 ff ff 66 01 c3 66 89 d8 66 e8 40 e9 ff ff 66 >> ERROR: invalid runstate transition: 'internal-error' -> 'colo' >> >> The reason is, some of the device state will be ignored when saving device state to slave, >> if the corresponding data is in its initial value, such as 0. >> But the device state in slave maybe in initialized value, after a loop of checkpoint, >> there will be inconsistent for the value of device state. >> This will happen when the PVM reboot or SVM run ahead of PVM in the startup process. >> >> Signed-off-by: zhanghailiang >> Signed-off-by: Yang Hongyang >> Signed-off-by: Gonglei >> --- >> migration/colo.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++++++--- >> 1 file changed, 50 insertions(+), 3 deletions(-) >> >> diff --git a/migration/colo.c b/migration/colo.c >> index 39cd698..0f61786 100644 >> --- a/migration/colo.c >> +++ b/migration/colo.c >> @@ -309,8 +309,10 @@ void *colo_process_incoming_checkpoints(void *opaque) >> struct colo_incoming *colo_in = opaque; >> QEMUFile *f = colo_in->file; >> int fd = qemu_get_fd(f); >> - QEMUFile *ctl = NULL; >> + QEMUFile *ctl = NULL, *fb = NULL; >> int ret; >> + uint64_t total_size; >> + >> colo = qemu_coroutine_self(); >> assert(colo != NULL); >> >> @@ -325,10 +327,17 @@ void *colo_process_incoming_checkpoints(void *opaque) >> goto out; >> } >> >> + colo_buffer = qsb_create(NULL, COLO_BUFFER_BASE_SIZE); >> + if (colo_buffer == NULL) { >> + error_report("Failed to allocate colo buffer!"); >> + goto out; >> + } >> + >> ret = colo_ctl_put(ctl, COLO_CHECPOINT_READY); >> if (ret < 0) { >> goto out; >> } >> + >> qemu_mutex_lock_iothread(); >> /* in COLO mode, slave is runing, so start the vm */ >> vm_start(); >> @@ -364,7 +373,18 @@ void *colo_process_incoming_checkpoints(void *opaque) >> } >> trace_colo_receive_message("COLO_CHECKPOINT_SEND"); >> >> - /*TODO Load VM state */ >> + /* read the VM state total size first */ >> + ret = colo_ctl_get_value(f, &total_size); >> + if (ret < 0) { >> + goto out; >> + } >> + >> + /* read vm device state into colo buffer */ >> + ret = qsb_fill_buffer(colo_buffer, f, total_size); >> + if (ret != total_size) { >> + error_report("can't get all migration data"); >> + goto out; >> + } >> >> ret = colo_ctl_put(ctl, COLO_CHECKPOINT_RECEIVED); >> if (ret < 0) { >> @@ -372,6 +392,22 @@ void *colo_process_incoming_checkpoints(void *opaque) >> } >> trace_colo_receive_message("COLO_CHECKPOINT_RECEIVED"); >> >> + /* open colo buffer for read */ >> + fb = qemu_bufopen("r", colo_buffer); >> + if (!fb) { >> + error_report("can't open colo buffer for read"); >> + goto out; >> + } >> + >> + qemu_mutex_lock_iothread(); >> + qemu_system_reset(VMRESET_SILENT); >> + if (qemu_loadvm_state(fb) < 0) { >> + error_report("COLO: loadvm failed"); >> + qemu_mutex_unlock_iothread(); >> + goto out; >> + } >> + qemu_mutex_unlock_iothread(); >> + >> /* TODO: flush vm state */ >> >> ret = colo_ctl_put(ctl, COLO_CHECKPOINT_LOADED); >> @@ -384,14 +420,25 @@ void *colo_process_incoming_checkpoints(void *opaque) >> vm_start(); >> qemu_mutex_unlock_iothread(); >> trace_colo_vm_state_change("stop", "start"); >> -} >> + >> + qemu_fclose(fb); >> + fb = NULL; >> + } >> >> out: >> colo = NULL; >> + >> + if (fb) { >> + qemu_fclose(fb); >> + } >> + >> release_ram_cache(); >> if (ctl) { >> qemu_fclose(ctl); >> } >> + >> + qsb_free(colo_buffer); >> + >> loadvm_exit_colo(); >> >> return NULL; >> -- >> 1.7.12.4 >> >> > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > . >