From: zhanghailiang <zhang.zhanghailiang@huawei.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: lizhijian@cn.fujitsu.com, quintela@redhat.com,
yunhong.jiang@intel.com, eddie.dong@intel.com,
peter.huangpeng@huawei.com, qemu-devel@nongnu.org,
arei.gonglei@huawei.com, amit.shah@redhat.com,
Yang Hongyang <yanghy@cn.fujitsu.com>,
david@gibson.dropbear.id.au
Subject: Re: [Qemu-devel] [PATCH COLO-Frame v5 11/29] COLO VMstate: Load VM state into qsb before restore it
Date: Tue, 9 Jun 2015 10:19:54 +0800 [thread overview]
Message-ID: <55764D4A.7010606@huawei.com> (raw)
In-Reply-To: <20150605180217.GI2139@work-vm>
On 2015/6/6 2:02, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> We should cache the device state before restore it,
>> besides, we should call qemu_system_reset() before load VM state,
>> which can ensure the data is intact.
>
> I think the description could be better; to me the important
> point is not that it's a 'cache', but the important point is that you
> don't destroy the state of the secondary until you are sure that you can
> read the whole state from the primary, just in case the primary fails
> in the middle of sending the state.
>
OK, I will fix this description.
> However, other than that:
>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>
> (I suspect you'll need updates as the qemu migration code updates)
>
Yes, thanks.
> Dave
>
>> Note: If we discard qemu_system_reset(), there will be some odd error,
>> For exmple, qemu in slave side crashes and reports:
>>
>> KVM: entry failed, hardware error 0x7
>> EAX=00000000 EBX=0000e000 ECX=00009578 EDX=0000434f
>> ESI=0000fc10 EDI=0000434f EBP=00000000 ESP=00001fca
>> EIP=00009594 EFL=00010246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>> ES =0040 00000400 0000ffff 00009300
>> CS =f000 000f0000 0000ffff 00009b00
>> SS =434f 000434f0 0000ffff 00009300
>> DS =434f 000434f0 0000ffff 00009300
>> FS =0000 00000000 0000ffff 00009300
>> GS =0000 00000000 0000ffff 00009300
>> LDT=0000 00000000 0000ffff 00008200
>> TR =0000 00000000 0000ffff 00008b00
>> GDT= 0002dcc8 00000047
>> IDT= 00000000 0000ffff
>> CR0=00000010 CR2=ffffffff CR3=00000000 CR4=00000000
>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>> DR6=00000000ffff0ff0 DR7=0000000000000400
>> EFER=0000000000000000
>> Code=c0 74 0f 66 b9 78 95 00 00 66 31 d2 66 31 c0 e9 47 e0 fb 90 <f3> 90 fa fc 66 c3 66 53 66 89 c3 66 e8 9d e8 ff ff 66 01 c3 66 89 d8 66 e8 40 e9 ff ff 66
>> ERROR: invalid runstate transition: 'internal-error' -> 'colo'
>>
>> The reason is, some of the device state will be ignored when saving device state to slave,
>> if the corresponding data is in its initial value, such as 0.
>> But the device state in slave maybe in initialized value, after a loop of checkpoint,
>> there will be inconsistent for the value of device state.
>> This will happen when the PVM reboot or SVM run ahead of PVM in the startup process.
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> ---
>> migration/colo.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++++++---
>> 1 file changed, 50 insertions(+), 3 deletions(-)
>>
>> diff --git a/migration/colo.c b/migration/colo.c
>> index 39cd698..0f61786 100644
>> --- a/migration/colo.c
>> +++ b/migration/colo.c
>> @@ -309,8 +309,10 @@ void *colo_process_incoming_checkpoints(void *opaque)
>> struct colo_incoming *colo_in = opaque;
>> QEMUFile *f = colo_in->file;
>> int fd = qemu_get_fd(f);
>> - QEMUFile *ctl = NULL;
>> + QEMUFile *ctl = NULL, *fb = NULL;
>> int ret;
>> + uint64_t total_size;
>> +
>> colo = qemu_coroutine_self();
>> assert(colo != NULL);
>>
>> @@ -325,10 +327,17 @@ void *colo_process_incoming_checkpoints(void *opaque)
>> goto out;
>> }
>>
>> + colo_buffer = qsb_create(NULL, COLO_BUFFER_BASE_SIZE);
>> + if (colo_buffer == NULL) {
>> + error_report("Failed to allocate colo buffer!");
>> + goto out;
>> + }
>> +
>> ret = colo_ctl_put(ctl, COLO_CHECPOINT_READY);
>> if (ret < 0) {
>> goto out;
>> }
>> +
>> qemu_mutex_lock_iothread();
>> /* in COLO mode, slave is runing, so start the vm */
>> vm_start();
>> @@ -364,7 +373,18 @@ void *colo_process_incoming_checkpoints(void *opaque)
>> }
>> trace_colo_receive_message("COLO_CHECKPOINT_SEND");
>>
>> - /*TODO Load VM state */
>> + /* read the VM state total size first */
>> + ret = colo_ctl_get_value(f, &total_size);
>> + if (ret < 0) {
>> + goto out;
>> + }
>> +
>> + /* read vm device state into colo buffer */
>> + ret = qsb_fill_buffer(colo_buffer, f, total_size);
>> + if (ret != total_size) {
>> + error_report("can't get all migration data");
>> + goto out;
>> + }
>>
>> ret = colo_ctl_put(ctl, COLO_CHECKPOINT_RECEIVED);
>> if (ret < 0) {
>> @@ -372,6 +392,22 @@ void *colo_process_incoming_checkpoints(void *opaque)
>> }
>> trace_colo_receive_message("COLO_CHECKPOINT_RECEIVED");
>>
>> + /* open colo buffer for read */
>> + fb = qemu_bufopen("r", colo_buffer);
>> + if (!fb) {
>> + error_report("can't open colo buffer for read");
>> + goto out;
>> + }
>> +
>> + qemu_mutex_lock_iothread();
>> + qemu_system_reset(VMRESET_SILENT);
>> + if (qemu_loadvm_state(fb) < 0) {
>> + error_report("COLO: loadvm failed");
>> + qemu_mutex_unlock_iothread();
>> + goto out;
>> + }
>> + qemu_mutex_unlock_iothread();
>> +
>> /* TODO: flush vm state */
>>
>> ret = colo_ctl_put(ctl, COLO_CHECKPOINT_LOADED);
>> @@ -384,14 +420,25 @@ void *colo_process_incoming_checkpoints(void *opaque)
>> vm_start();
>> qemu_mutex_unlock_iothread();
>> trace_colo_vm_state_change("stop", "start");
>> -}
>> +
>> + qemu_fclose(fb);
>> + fb = NULL;
>> + }
>>
>> out:
>> colo = NULL;
>> +
>> + if (fb) {
>> + qemu_fclose(fb);
>> + }
>> +
>> release_ram_cache();
>> if (ctl) {
>> qemu_fclose(ctl);
>> }
>> +
>> + qsb_free(colo_buffer);
>> +
>> loadvm_exit_colo();
>>
>> return NULL;
>> --
>> 1.7.12.4
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>
next prev parent reply other threads:[~2015-06-09 2:20 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-21 8:12 [Qemu-devel] [PATCH COLO-Frame v5 00/29] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
2015-05-21 8:12 ` [Qemu-devel] [PATCH COLO-Frame v5 01/29] configure: Add parameter for configure to enable/disable COLO support zhanghailiang
2015-05-21 8:12 ` [Qemu-devel] [PATCH COLO-Frame v5 02/29] migration: Introduce capability 'colo' to migration zhanghailiang
2015-05-21 8:12 ` [Qemu-devel] [PATCH COLO-Frame v5 03/29] COLO: migrate colo related info to slave zhanghailiang
2015-05-21 8:12 ` [Qemu-devel] [PATCH COLO-Frame v5 04/29] migration: Integrate COLO checkpoint process into migration zhanghailiang
2015-05-21 8:12 ` [Qemu-devel] [PATCH COLO-Frame v5 05/29] migration: Integrate COLO checkpoint process into loadvm zhanghailiang
2015-05-21 8:12 ` [Qemu-devel] [PATCH COLO-Frame v5 06/29] COLO: Implement colo checkpoint protocol zhanghailiang
2015-05-21 8:12 ` [Qemu-devel] [PATCH COLO-Frame v5 07/29] COLO: Add a new RunState RUN_STATE_COLO zhanghailiang
2015-05-21 8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 08/29] QEMUSizedBuffer: Introduce two help functions for qsb zhanghailiang
2015-05-21 8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 09/29] COLO: Save VM state to slave when do checkpoint zhanghailiang
2015-05-21 8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 10/29] COLO RAM: Load PVM's dirty page into SVM's RAM cache temporarily zhanghailiang
2015-05-21 8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 11/29] COLO VMstate: Load VM state into qsb before restore it zhanghailiang
2015-06-05 18:02 ` Dr. David Alan Gilbert
2015-06-09 2:19 ` zhanghailiang [this message]
2015-05-21 8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 12/29] arch_init: Start to trace dirty pages of SVM zhanghailiang
2015-05-21 8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 13/29] COLO RAM: Flush cached RAM into SVM's memory zhanghailiang
2015-05-21 8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 14/29] COLO failover: Introduce a new command to trigger a failover zhanghailiang
2015-05-21 8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 15/29] COLO failover: Implement COLO master/slave failover work zhanghailiang
2015-05-21 8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 16/29] COLO failover: Don't do failover during loading VM's state zhanghailiang
2015-05-21 8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 17/29] COLO: Add new command parameter 'colo_nicname' 'colo_script' for net zhanghailiang
2015-05-21 8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 18/29] COLO NIC: Init/remove colo nic devices when add/cleanup tap devices zhanghailiang
2015-05-21 8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 19/29] COLO NIC: Implement colo nic device interface configure() zhanghailiang
2015-05-21 8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 20/29] COLO NIC : Implement colo nic init/destroy function zhanghailiang
2015-05-21 8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 21/29] COLO NIC: Some init work related with proxy module zhanghailiang
2015-05-21 8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 22/29] COLO: Handle nfnetlink message from " zhanghailiang
2015-05-21 8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 23/29] COLO: Do checkpoint according to the result of packets comparation zhanghailiang
2015-05-21 8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 24/29] COLO: Improve checkpoint efficiency by do additional periodic checkpoint zhanghailiang
2015-05-21 8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 25/29] COLO: Add colo-set-checkpoint-period command zhanghailiang
2015-06-05 18:45 ` Dr. David Alan Gilbert
2015-06-09 3:28 ` zhanghailiang
2015-06-09 8:01 ` Dr. David Alan Gilbert
2015-06-09 10:14 ` zhanghailiang
2015-05-21 8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 26/29] COLO NIC: Implement NIC checkpoint and failover zhanghailiang
2015-05-21 8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 27/29] COLO: Disable qdev hotplug when VM is in COLO mode zhanghailiang
2015-05-21 8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 28/29] COLO: Implement shutdown checkpoint zhanghailiang
2015-05-21 8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 29/29] COLO: Add block replication into colo process zhanghailiang
2015-05-21 11:30 ` [Qemu-devel] [PATCH COLO-Frame v5 00/29] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service Dr. David Alan Gilbert
2015-05-22 6:26 ` zhanghailiang
2015-05-28 16:24 ` Dr. David Alan Gilbert
2015-05-29 1:29 ` Wen Congyang
2015-05-29 8:01 ` Dr. David Alan Gilbert
2015-05-29 8:06 ` zhanghailiang
2015-05-29 8:42 ` Dr. David Alan Gilbert
[not found] ` <55685CCA.2010604@cn.fujitsu.com>
2015-05-29 15:12 ` Dr. David Alan Gilbert
2015-06-01 1:41 ` Wen Congyang
2015-06-01 9:16 ` Dr. David Alan Gilbert
2015-06-02 3:51 ` Wen Congyang
2015-06-02 8:02 ` Dr. David Alan Gilbert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55764D4A.7010606@huawei.com \
--to=zhang.zhanghailiang@huawei.com \
--cc=amit.shah@redhat.com \
--cc=arei.gonglei@huawei.com \
--cc=david@gibson.dropbear.id.au \
--cc=dgilbert@redhat.com \
--cc=eddie.dong@intel.com \
--cc=lizhijian@cn.fujitsu.com \
--cc=peter.huangpeng@huawei.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=yanghy@cn.fujitsu.com \
--cc=yunhong.jiang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).