All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shunsuke Kurumatani <kurumatani.shunsuke@lab.ntt.co.jp>
To: Yang Hongyang <yanghy@cn.fujitsu.com>, qemu-devel@nongnu.org
Cc: GuiJianfeng@cn.fujitsu.com, yunhong.jiang@intel.com,
	eddie.dong@intel.com, dgilbert@redhat.com,
	mrhines@linux.vnet.ibm.com
Subject: Re: [Qemu-devel] [RFC PATCH v2 13/23] COLO ctl: implement colo save
Date: Wed, 08 Oct 2014 19:23:02 +0900	[thread overview]
Message-ID: <54351086.3060509@lab.ntt.co.jp> (raw)
In-Reply-To: <1411464235-5653-14-git-send-email-yanghy@cn.fujitsu.com>

Hi,

I tried and executed this exciting patches named colo. However this
patch causes abnormal termination in my environment. Although I
think it's a known issue, the details and a presumed origin is
described below:


On 2014/09/23 18:23, Yang Hongyang wrote:
> implement colo save
> 
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> ---
>   migration-colo.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++-------
>   1 file changed, 53 insertions(+), 7 deletions(-)
> 
> diff --git a/migration-colo.c b/migration-colo.c
> index 2e478e9..d99342a 100644
> --- a/migration-colo.c
> +++ b/migration-colo.c
> @@ -13,6 +13,7 @@
>   #include "block/coroutine.h"
>   #include "hw/qdev-core.h"
>   #include "qemu/timer.h"
> +#include "sysemu/sysemu.h"
>   #include "migration/migration-colo.h"
>   #include <sys/ioctl.h>
>   #include "qemu/error-report.h"
> @@ -106,12 +107,12 @@ static int colo_compare(void)
>       return ioctl(comp_fd, COMP_IOCTWAIT, 250);
>   }
>   
> -static __attribute__((unused)) int colo_compare_flush(void)
> +static int colo_compare_flush(void)
>   {
>       return ioctl(comp_fd, COMP_IOCTFLUSH, 1);
>   }
>   
> -static __attribute__((unused)) int colo_compare_resume(void)
> +static int colo_compare_resume(void)
>   {
>       return ioctl(comp_fd, COMP_IOCTRESUME, 1);
>   }
> @@ -200,6 +201,9 @@ static bool colo_is_master(void)
>   static int do_colo_transaction(MigrationState *s, QEMUFile *control)
>   {
>       int ret;
> +    uint8_t *buf;
> +    size_t size;
> +    QEMUFile *trans = NULL;
>   
>       ret = colo_ctl_put(s->file, COLO_CHECKPOINT_NEW);
>       if (ret) {
> @@ -211,30 +215,73 @@ static int do_colo_transaction(MigrationState *s, QEMUFile *control)
>           goto out;
>       }
>   
> -    /* TODO: suspend and save vm state to colo buffer */
> +    /* open colo buffer for write */
> +    trans = qemu_bufopen("w", NULL);
> +    if (!trans) {
> +        error_report("Open colo buffer for write failed");
> +        goto out;
> +    }
> +
> +    /* suspend and save vm state to colo buffer */
> +    qemu_mutex_lock_iothread();
> +    vm_stop_force_state(RUN_STATE_COLO);
> +    qemu_mutex_unlock_iothread();
> +    /* Disable block migration */
> +    s->params.blk = 0;
> +    s->params.shared = 0;
> +    qemu_savevm_state_begin(trans, &s->params);
> +    qemu_savevm_state_complete(trans);

This line causes aborting Qemu immediately after starting a colo's
migration process. If I'm not mistaken, the cause of aborting is not
getting mutex lock when calling qemu_savevm_state_complete(). The
aborting was resolved by getting mutex lock chen calling
qemu_save_state_complete().

Thanks,
Shunsuke


> +
> +    qemu_fflush(trans);
>   
>       ret = colo_ctl_put(s->file, COLO_CHECKPOINT_SEND);
>       if (ret) {
>           goto out;
>       }
>   
> -    /* TODO: send vmstate to slave */
> +    /* send vmstate to slave */
> +
> +    /* we send the total size of the vmstate first */
> +    size = qsb_get_length(qemu_buf_get(trans));
> +    ret = colo_ctl_put(s->file, size);
> +    if (ret) {
> +        goto out;
> +    }
> +
> +    buf = g_malloc(size);
> +    qsb_get_buffer(qemu_buf_get(trans), 0, size, &buf);
> +    qemu_put_buffer(s->file, buf, size);
> +    g_free(buf);
> +    ret = qemu_file_get_error(s->file);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +    qemu_fflush(s->file);
>   
>       ret = colo_ctl_get(control, COLO_CHECKPOINT_RECEIVED);
>       if (ret) {
>           goto out;
>       }
>   
> -    /* TODO: Flush network etc. */
> +    /* Flush network etc. */
> +    colo_compare_flush();
>   
>       ret = colo_ctl_get(control, COLO_CHECKPOINT_LOADED);
>       if (ret) {
>           goto out;
>       }
>   
> -    /* TODO: resume master */
> +    colo_compare_resume();
> +    ret = 0;
>   
>   out:
> +    if (trans)
> +        qemu_fclose(trans);
> +    /* resume master */
> +    qemu_mutex_lock_iothread();
> +    vm_start();
> +    qemu_mutex_unlock_iothread();
> +
>       return ret;
>   }
>   
> @@ -289,7 +336,6 @@ static void *colo_thread(void *opaque)
>           }
>   
>           /* start a colo checkpoint */
> -
>           if (do_colo_transaction(s, colo_control)) {
>               goto out;
>           }
> 

  reply	other threads:[~2014-10-08 10:23 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-23  9:23 [Qemu-devel] [RFC PATCH v2 00/23] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service Yang Hongyang
2014-09-23  9:23 ` [Qemu-devel] [RFC PATCH v2 01/23] QEMUSizedBuffer/QEMUFile Yang Hongyang
2014-09-23  9:23 ` [Qemu-devel] [RFC PATCH v2 02/23] configure: add CONFIG_COLO to switch COLO support Yang Hongyang
2014-09-23  9:23 ` [Qemu-devel] [RFC PATCH v2 03/23] COLO: introduce an api colo_supported() to indicate " Yang Hongyang
2014-10-08 15:02   ` Eric Blake
2014-10-09  1:06     ` Wen Congyang
2014-09-23  9:23 ` [Qemu-devel] [RFC PATCH v2 04/23] COLO migration: add a migration capability 'colo' Yang Hongyang
2014-09-23  9:23 ` [Qemu-devel] [RFC PATCH v2 05/23] COLO info: use colo info to tell migration target colo is enabled Yang Hongyang
2014-09-23  9:23 ` [Qemu-devel] [RFC PATCH v2 06/23] COLO save: integrate COLO checkpointed save into qemu migration Yang Hongyang
2014-09-23  9:23 ` [Qemu-devel] [RFC PATCH v2 07/23] COLO restore: integrate COLO checkpointed restore into qemu restore Yang Hongyang
2014-09-23  9:23 ` [Qemu-devel] [RFC PATCH v2 08/23] COLO: disable qdev hotplug Yang Hongyang
2014-09-23  9:23 ` [Qemu-devel] [RFC PATCH v2 09/23] COLO ctl: implement API's that communicate with colo agent Yang Hongyang
2014-09-23  9:23 ` [Qemu-devel] [RFC PATCH v2 10/23] COLO ctl: introduce is_slave() and is_master() Yang Hongyang
2014-09-23  9:23 ` [Qemu-devel] [RFC PATCH v2 11/23] COLO ctl: implement colo checkpoint protocol Yang Hongyang
2014-09-23  9:23 ` [Qemu-devel] [RFC PATCH v2 12/23] COLO ctl: add a RunState RUN_STATE_COLO Yang Hongyang
2014-09-23  9:23 ` [Qemu-devel] [RFC PATCH v2 13/23] COLO ctl: implement colo save Yang Hongyang
2014-10-08 10:23   ` Shunsuke Kurumatani [this message]
2014-09-23  9:23 ` [Qemu-devel] [RFC PATCH v2 14/23] COLO ctl: implement colo restore Yang Hongyang
2014-09-23  9:23 ` [Qemu-devel] [RFC PATCH v2 15/23] COLO save: reuse migration bitmap under colo checkpoint Yang Hongyang
2014-09-23  9:23 ` [Qemu-devel] [RFC PATCH v2 16/23] COLO ram cache: implement colo ram cache on slave Yang Hongyang
2014-09-23  9:23 ` [Qemu-devel] [RFC PATCH v2 17/23] HACK: trigger checkpoint every 500ms Yang Hongyang
2014-09-23  9:23 ` [Qemu-devel] [RFC PATCH v2 18/23] COLO nic: add command line switch Yang Hongyang
2014-09-23 17:04   ` Eric Blake
2014-09-23  9:23 ` [Qemu-devel] [RFC PATCH v2 19/23] COLO nic: init/remove colo nic devices when add/cleanup tap devices Yang Hongyang
2014-09-23  9:23 ` [Qemu-devel] [RFC PATCH v2 20/23] COLO nic: implement colo nic device interface support_colo() Yang Hongyang
2014-09-23  9:23 ` [Qemu-devel] [RFC PATCH v2 21/23] COLO nic: implement colo nic device interface configure() Yang Hongyang
2014-10-27 17:49   ` Dr. David Alan Gilbert
2014-09-23  9:23 ` [Qemu-devel] [RFC PATCH v2 22/23] COLO nic: export colo nic APIs Yang Hongyang
2014-09-23  9:23 ` [Qemu-devel] [RFC PATCH v2 23/23] COLO nic: setup/teardown colo nic devices Yang Hongyang
2014-10-29  6:53 ` [Qemu-devel] [RFC PATCH v2 00/23] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service Wen Congyang
2014-10-29  9:34   ` Dr. David Alan Gilbert
2014-10-29  9:54     ` Wen Congyang
2014-10-29 11:05       ` Dr. David Alan Gilbert
2014-10-29 17:19       ` Stefan Hajnoczi
2014-10-29 10:19     ` Hongyang Yang
2014-10-29 11:01       ` Dr. David Alan Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54351086.3060509@lab.ntt.co.jp \
    --to=kurumatani.shunsuke@lab.ntt.co.jp \
    --cc=GuiJianfeng@cn.fujitsu.com \
    --cc=dgilbert@redhat.com \
    --cc=eddie.dong@intel.com \
    --cc=mrhines@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=yanghy@cn.fujitsu.com \
    --cc=yunhong.jiang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.