qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: zhanghailiang <zhang.zhanghailiang@huawei.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: lizhijian@cn.fujitsu.com, quintela@redhat.com,
	yunhong.jiang@intel.com, eddie.dong@intel.com,
	peter.huangpeng@huawei.com, qemu-devel@nongnu.org,
	arei.gonglei@huawei.com, amit.shah@redhat.com,
	Yang Hongyang <yanghy@cn.fujitsu.com>,
	david@gibson.dropbear.id.au
Subject: Re: [Qemu-devel] [PATCH COLO-Frame v5 11/29] COLO VMstate: Load VM state into qsb before restore it
Date: Tue, 9 Jun 2015 10:19:54 +0800	[thread overview]
Message-ID: <55764D4A.7010606@huawei.com> (raw)
In-Reply-To: <20150605180217.GI2139@work-vm>

On 2015/6/6 2:02, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> We should cache the device state before restore it,
>> besides, we should call qemu_system_reset() before load VM state,
>> which can ensure the data is intact.
>
> I think the description could be better; to me the important
> point is not that it's a 'cache', but the important point is that you
> don't destroy the state of the secondary until you are sure that you can
> read the whole state from the primary, just in case the primary fails
> in the middle of sending the state.
>

OK, I will fix this description.

> However, other than that:
>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>
> (I suspect you'll need updates as the qemu migration code updates)
>

Yes, thanks.

> Dave
>
>> Note: If we discard qemu_system_reset(), there will be some odd error,
>> For exmple, qemu in slave side crashes and reports:
>>
>> KVM: entry failed, hardware error 0x7
>> EAX=00000000 EBX=0000e000 ECX=00009578 EDX=0000434f
>> ESI=0000fc10 EDI=0000434f EBP=00000000 ESP=00001fca
>> EIP=00009594 EFL=00010246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>> ES =0040 00000400 0000ffff 00009300
>> CS =f000 000f0000 0000ffff 00009b00
>> SS =434f 000434f0 0000ffff 00009300
>> DS =434f 000434f0 0000ffff 00009300
>> FS =0000 00000000 0000ffff 00009300
>> GS =0000 00000000 0000ffff 00009300
>> LDT=0000 00000000 0000ffff 00008200
>> TR =0000 00000000 0000ffff 00008b00
>> GDT=     0002dcc8 00000047
>> IDT=     00000000 0000ffff
>> CR0=00000010 CR2=ffffffff CR3=00000000 CR4=00000000
>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>> DR6=00000000ffff0ff0 DR7=0000000000000400
>> EFER=0000000000000000
>> Code=c0 74 0f 66 b9 78 95 00 00 66 31 d2 66 31 c0 e9 47 e0 fb 90 <f3> 90 fa fc 66 c3 66 53 66 89 c3 66 e8 9d e8 ff ff 66 01 c3 66 89 d8 66 e8 40 e9 ff ff 66
>> ERROR: invalid runstate transition: 'internal-error' -> 'colo'
>>
>> The reason is, some of the device state will be ignored when saving device state to slave,
>> if the corresponding data is in its initial value, such as 0.
>> But the device state in slave maybe in initialized value, after a loop of checkpoint,
>> there will be inconsistent for the value of device state.
>> This will happen when the PVM reboot or SVM run ahead of PVM in the startup process.
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> ---
>>   migration/colo.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++++++---
>>   1 file changed, 50 insertions(+), 3 deletions(-)
>>
>> diff --git a/migration/colo.c b/migration/colo.c
>> index 39cd698..0f61786 100644
>> --- a/migration/colo.c
>> +++ b/migration/colo.c
>> @@ -309,8 +309,10 @@ void *colo_process_incoming_checkpoints(void *opaque)
>>       struct colo_incoming *colo_in = opaque;
>>       QEMUFile *f = colo_in->file;
>>       int fd = qemu_get_fd(f);
>> -    QEMUFile *ctl = NULL;
>> +    QEMUFile *ctl = NULL, *fb = NULL;
>>       int ret;
>> +    uint64_t total_size;
>> +
>>       colo = qemu_coroutine_self();
>>       assert(colo != NULL);
>>
>> @@ -325,10 +327,17 @@ void *colo_process_incoming_checkpoints(void *opaque)
>>           goto out;
>>       }
>>
>> +    colo_buffer = qsb_create(NULL, COLO_BUFFER_BASE_SIZE);
>> +    if (colo_buffer == NULL) {
>> +        error_report("Failed to allocate colo buffer!");
>> +        goto out;
>> +    }
>> +
>>       ret = colo_ctl_put(ctl, COLO_CHECPOINT_READY);
>>       if (ret < 0) {
>>           goto out;
>>       }
>> +
>>       qemu_mutex_lock_iothread();
>>       /* in COLO mode, slave is runing, so start the vm */
>>       vm_start();
>> @@ -364,7 +373,18 @@ void *colo_process_incoming_checkpoints(void *opaque)
>>           }
>>           trace_colo_receive_message("COLO_CHECKPOINT_SEND");
>>
>> -        /*TODO Load VM state */
>> +        /* read the VM state total size first */
>> +        ret = colo_ctl_get_value(f, &total_size);
>> +        if (ret < 0) {
>> +            goto out;
>> +        }
>> +
>> +        /* read vm device state into colo buffer */
>> +        ret = qsb_fill_buffer(colo_buffer, f, total_size);
>> +        if (ret != total_size) {
>> +            error_report("can't get all migration data");
>> +            goto out;
>> +        }
>>
>>           ret = colo_ctl_put(ctl, COLO_CHECKPOINT_RECEIVED);
>>           if (ret < 0) {
>> @@ -372,6 +392,22 @@ void *colo_process_incoming_checkpoints(void *opaque)
>>           }
>>           trace_colo_receive_message("COLO_CHECKPOINT_RECEIVED");
>>
>> +        /* open colo buffer for read */
>> +        fb = qemu_bufopen("r", colo_buffer);
>> +        if (!fb) {
>> +            error_report("can't open colo buffer for read");
>> +            goto out;
>> +        }
>> +
>> +        qemu_mutex_lock_iothread();
>> +        qemu_system_reset(VMRESET_SILENT);
>> +        if (qemu_loadvm_state(fb) < 0) {
>> +            error_report("COLO: loadvm failed");
>> +            qemu_mutex_unlock_iothread();
>> +            goto out;
>> +        }
>> +        qemu_mutex_unlock_iothread();
>> +
>>           /* TODO: flush vm state */
>>
>>           ret = colo_ctl_put(ctl, COLO_CHECKPOINT_LOADED);
>> @@ -384,14 +420,25 @@ void *colo_process_incoming_checkpoints(void *opaque)
>>           vm_start();
>>           qemu_mutex_unlock_iothread();
>>           trace_colo_vm_state_change("stop", "start");
>> -}
>> +
>> +        qemu_fclose(fb);
>> +        fb = NULL;
>> +    }
>>
>>   out:
>>       colo = NULL;
>> +
>> +    if (fb) {
>> +        qemu_fclose(fb);
>> +    }
>> +
>>       release_ram_cache();
>>       if (ctl) {
>>           qemu_fclose(ctl);
>>       }
>> +
>> +    qsb_free(colo_buffer);
>> +
>>       loadvm_exit_colo();
>>
>>       return NULL;
>> --
>> 1.7.12.4
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

  reply	other threads:[~2015-06-09  2:20 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-21  8:12 [Qemu-devel] [PATCH COLO-Frame v5 00/29] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
2015-05-21  8:12 ` [Qemu-devel] [PATCH COLO-Frame v5 01/29] configure: Add parameter for configure to enable/disable COLO support zhanghailiang
2015-05-21  8:12 ` [Qemu-devel] [PATCH COLO-Frame v5 02/29] migration: Introduce capability 'colo' to migration zhanghailiang
2015-05-21  8:12 ` [Qemu-devel] [PATCH COLO-Frame v5 03/29] COLO: migrate colo related info to slave zhanghailiang
2015-05-21  8:12 ` [Qemu-devel] [PATCH COLO-Frame v5 04/29] migration: Integrate COLO checkpoint process into migration zhanghailiang
2015-05-21  8:12 ` [Qemu-devel] [PATCH COLO-Frame v5 05/29] migration: Integrate COLO checkpoint process into loadvm zhanghailiang
2015-05-21  8:12 ` [Qemu-devel] [PATCH COLO-Frame v5 06/29] COLO: Implement colo checkpoint protocol zhanghailiang
2015-05-21  8:12 ` [Qemu-devel] [PATCH COLO-Frame v5 07/29] COLO: Add a new RunState RUN_STATE_COLO zhanghailiang
2015-05-21  8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 08/29] QEMUSizedBuffer: Introduce two help functions for qsb zhanghailiang
2015-05-21  8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 09/29] COLO: Save VM state to slave when do checkpoint zhanghailiang
2015-05-21  8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 10/29] COLO RAM: Load PVM's dirty page into SVM's RAM cache temporarily zhanghailiang
2015-05-21  8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 11/29] COLO VMstate: Load VM state into qsb before restore it zhanghailiang
2015-06-05 18:02   ` Dr. David Alan Gilbert
2015-06-09  2:19     ` zhanghailiang [this message]
2015-05-21  8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 12/29] arch_init: Start to trace dirty pages of SVM zhanghailiang
2015-05-21  8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 13/29] COLO RAM: Flush cached RAM into SVM's memory zhanghailiang
2015-05-21  8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 14/29] COLO failover: Introduce a new command to trigger a failover zhanghailiang
2015-05-21  8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 15/29] COLO failover: Implement COLO master/slave failover work zhanghailiang
2015-05-21  8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 16/29] COLO failover: Don't do failover during loading VM's state zhanghailiang
2015-05-21  8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 17/29] COLO: Add new command parameter 'colo_nicname' 'colo_script' for net zhanghailiang
2015-05-21  8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 18/29] COLO NIC: Init/remove colo nic devices when add/cleanup tap devices zhanghailiang
2015-05-21  8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 19/29] COLO NIC: Implement colo nic device interface configure() zhanghailiang
2015-05-21  8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 20/29] COLO NIC : Implement colo nic init/destroy function zhanghailiang
2015-05-21  8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 21/29] COLO NIC: Some init work related with proxy module zhanghailiang
2015-05-21  8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 22/29] COLO: Handle nfnetlink message from " zhanghailiang
2015-05-21  8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 23/29] COLO: Do checkpoint according to the result of packets comparation zhanghailiang
2015-05-21  8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 24/29] COLO: Improve checkpoint efficiency by do additional periodic checkpoint zhanghailiang
2015-05-21  8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 25/29] COLO: Add colo-set-checkpoint-period command zhanghailiang
2015-06-05 18:45   ` Dr. David Alan Gilbert
2015-06-09  3:28     ` zhanghailiang
2015-06-09  8:01       ` Dr. David Alan Gilbert
2015-06-09 10:14         ` zhanghailiang
2015-05-21  8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 26/29] COLO NIC: Implement NIC checkpoint and failover zhanghailiang
2015-05-21  8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 27/29] COLO: Disable qdev hotplug when VM is in COLO mode zhanghailiang
2015-05-21  8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 28/29] COLO: Implement shutdown checkpoint zhanghailiang
2015-05-21  8:13 ` [Qemu-devel] [PATCH COLO-Frame v5 29/29] COLO: Add block replication into colo process zhanghailiang
2015-05-21 11:30 ` [Qemu-devel] [PATCH COLO-Frame v5 00/29] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service Dr. David Alan Gilbert
2015-05-22  6:26   ` zhanghailiang
2015-05-28 16:24 ` Dr. David Alan Gilbert
2015-05-29  1:29   ` Wen Congyang
2015-05-29  8:01     ` Dr. David Alan Gilbert
2015-05-29  8:06     ` zhanghailiang
2015-05-29  8:42       ` Dr. David Alan Gilbert
     [not found]         ` <55685CCA.2010604@cn.fujitsu.com>
2015-05-29 15:12           ` Dr. David Alan Gilbert
2015-06-01  1:41         ` Wen Congyang
2015-06-01  9:16           ` Dr. David Alan Gilbert
2015-06-02  3:51   ` Wen Congyang
2015-06-02  8:02     ` Dr. David Alan Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55764D4A.7010606@huawei.com \
    --to=zhang.zhanghailiang@huawei.com \
    --cc=amit.shah@redhat.com \
    --cc=arei.gonglei@huawei.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=dgilbert@redhat.com \
    --cc=eddie.dong@intel.com \
    --cc=lizhijian@cn.fujitsu.com \
    --cc=peter.huangpeng@huawei.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=yanghy@cn.fujitsu.com \
    --cc=yunhong.jiang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).