From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Wen Congyang <wency@cn.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>,
Jiang Yunhong <yunhong.jiang@intel.com>,
Dong Eddie <eddie.dong@intel.com>, Ye Wei <wei.ye1987@gmail.com>,
xen-devl <xen-devel@lists.xen.org>,
Hong Tao <bobby.hong@huawei.com>, Xu Yao <xuyao.xu@huawei.com>,
Shriram Rajagopalan <rshriram@cs.ubc.ca>
Subject: Re: [RFC Patch v2 15/16] xc_domain_save: implement save_callbacks for colo
Date: Thu, 11 Jul 2013 14:52:46 +0100 [thread overview]
Message-ID: <51DEB8AE.6020102@citrix.com> (raw)
In-Reply-To: <1373531748-12547-16-git-send-email-wency@cn.fujitsu.com>
On 11/07/13 09:35, Wen Congyang wrote:
> Add a new save callbacks:
> 1. post_sendstate(): SVM will run only when XC_SAVE_ID_LAST_CHECKPOINT is
> sent to slaver. But we only sent XC_SAVE_ID_LAST_CHECKPOINT when we do
> live migration now. Add this callback, and we can send it in this
> callback.
>
> Update some callbacks for colo:
> 1. suspend(): In colo mode, both PVM and SVM are running. So we should suspend
> both PVM and SVM.
> Communicate with slaver like this:
> a. write "continue" to notify slaver to suspend SVM
> b. suspend PVM and SVM
> c. slaver writes "suspend" to tell master that SVM is suspended
> 2. postcopy(): In colo mode, both PVM and SVM are running, and we have suspended
> both PVM and SVM. So we should resume PVM and SVM
> Communicate with slaver like this:
> a. write "resume" to notify slaver to resume SVM
> b. resume PVM and SVM
> c. slaver writes "resume" to tell master that SVM is resumed
> 3. checkpoint(): In colo mode, we do a new checkpoint only when output packet
> from PVM and SVM is different. We will block in this callback and return
> when a output packet is different.
>
> Signed-off-by: Ye Wei <wei.ye1987@gmail.com>
> Signed-off-by: Jiang Yunhong <yunhong.jiang@intel.com>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> ---
> tools/libxc/xc_domain_save.c | 17 ++
> tools/libxc/xenguest.h | 3 +
> tools/python/xen/lowlevel/checkpoint/checkpoint.c | 302 ++++++++++++++++++++-
> tools/python/xen/lowlevel/checkpoint/checkpoint.h | 1 +
> 4 files changed, 319 insertions(+), 4 deletions(-)
>
> diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
> index b477188..8f84c9b 100644
> --- a/tools/libxc/xc_domain_save.c
> +++ b/tools/libxc/xc_domain_save.c
> @@ -1785,6 +1785,23 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
> }
> }
>
> + /* Flush last write and discard cache for file. */
> + if ( outbuf_flush(xch, ob, io_fd) < 0 ) {
> + PERROR("Error when flushing output buffer");
> + rc = 1;
> + }
> +
> + discard_file_cache(xch, io_fd, 1 /* flush */);
> +
> + if ( callbacks->post_sendstate )
> + {
> + if ( callbacks->post_sendstate(callbacks->data) < 0)
> + {
> + PERROR("Error: post_sendstate()\n");
> + goto out;
> + }
> + }
> +
> /* Zero terminate */
> i = 0;
> if ( wrexact(io_fd, &i, sizeof(int)) )
> diff --git a/tools/libxc/xenguest.h b/tools/libxc/xenguest.h
> index 4bb444a..9d7d03c 100644
> --- a/tools/libxc/xenguest.h
> +++ b/tools/libxc/xenguest.h
> @@ -72,6 +72,9 @@ struct save_callbacks {
> */
> int (*toolstack_save)(uint32_t domid, uint8_t **buf, uint32_t *len, void *data);
>
> + /* called before Zero terminate is sent */
> + int (*post_sendstate)(void *data);
> +
> /* to be provided as the last argument to each callback function */
> void* data;
> };
> diff --git a/tools/python/xen/lowlevel/checkpoint/checkpoint.c b/tools/python/xen/lowlevel/checkpoint/checkpoint.c
> index ec14b27..28bdb23 100644
> --- a/tools/python/xen/lowlevel/checkpoint/checkpoint.c
> +++ b/tools/python/xen/lowlevel/checkpoint/checkpoint.c
> @@ -1,14 +1,22 @@
> /* python bridge to checkpointing API */
>
> #include <Python.h>
> +#include <sys/wait.h>
I cant see anything using this header file which is good, as otherwise I
would still tell you that a python module should not be using any of its
contents.
~Andrew
>
> #include <xenstore.h>
> #include <xenctrl.h>
> +#include <xc_private.h>
> +#include <xg_save_restore.h>
>
> #include "checkpoint.h"
>
> #define PKG "xen.lowlevel.checkpoint"
>
> +#define COMP_IOC_MAGIC 'k'
> +#define COMP_IOCTWAIT _IO(COMP_IOC_MAGIC, 0)
> +#define COMP_IOCTFLUSH _IO(COMP_IOC_MAGIC, 1)
> +#define COMP_IOCTRESUME _IO(COMP_IOC_MAGIC, 2)
> +
> static PyObject* CheckpointError;
>
> typedef struct {
> @@ -25,11 +33,15 @@ typedef struct {
> PyObject* setup_cb;
>
> PyThreadState* threadstate;
> + int colo;
> + int first_time;
> + int dev_fd;
> } CheckpointObject;
>
> static int suspend_trampoline(void* data);
> static int postcopy_trampoline(void* data);
> static int checkpoint_trampoline(void* data);
> +static int post_sendstate_trampoline(void *data);
>
> static PyObject* Checkpoint_new(PyTypeObject* type, PyObject* args,
> PyObject* kwargs)
> @@ -169,10 +181,17 @@ static PyObject* pycheckpoint_start(PyObject* obj, PyObject* args) {
> } else
> self->setup_cb = NULL;
>
> + if (flags & CHECKPOINT_FLAGS_COLO)
> + self->colo = 1;
> + else
> + self->colo = 0;
> + self->first_time = 1;
> +
> memset(&callbacks, 0, sizeof(callbacks));
> callbacks.suspend = suspend_trampoline;
> callbacks.postcopy = postcopy_trampoline;
> callbacks.checkpoint = checkpoint_trampoline;
> + callbacks.post_sendstate = post_sendstate_trampoline;
> callbacks.data = self;
>
> self->threadstate = PyEval_SaveThread();
> @@ -279,6 +298,196 @@ PyMODINIT_FUNC initcheckpoint(void) {
> block_timer();
> }
>
> +/* colo functions */
> +
> +/* master slaver comment
> + * "continue" ===>
> + * <=== "suspend" guest is suspended
> + */
> +static int notify_slaver_suspend(CheckpointObject *self)
> +{
> + int fd = self->cps.fd;
> +
> + if (self->first_time == 1)
> + return 0;
> +
> + return write_exact(fd, "continue", 8);
> +}
> +
> +static int wait_slaver_suspend(CheckpointObject *self)
> +{
> + int fd = self->cps.fd;
> + xc_interface *xch = self->cps.xch;
> + char buf[8];
> +
> + if (self->first_time == 1)
> + return 0;
> +
> + if ( read_exact(fd, buf, 7) < 0) {
> + PERROR("read: suspend");
> + return -1;
> + }
> +
> + buf[7] = '\0';
> + if (strcmp(buf, "suspend")) {
> + PERROR("read \"%s\", expect \"suspend\"", buf);
> + return -1;
> + }
> +
> + return 0;
> +}
> +
> +static int notify_slaver_start_checkpoint(CheckpointObject *self)
> +{
> + int fd = self->cps.fd;
> + xc_interface *xch = self->cps.xch;
> +
> + if (self->first_time == 1)
> + return 0;
> +
> + if ( write_exact(fd, "start", 5) < 0) {
> + PERROR("write start");
> + return -1;
> + }
> +
> + return 0;
> +}
> +
> +/*
> + * master slaver
> + * <==== "finish"
> + * flush packets
> + * "resume" ====>
> + * resume vm resume vm
> + * <==== "resume"
> + */
> +static int notify_slaver_resume(CheckpointObject *self)
> +{
> + int fd = self->cps.fd;
> + xc_interface *xch = self->cps.xch;
> + char buf[7];
> +
> + /* wait slaver to finish update memory, device state... */
> + if ( read_exact(fd, buf, 6) < 0) {
> + PERROR("read: finish");
> + return -1;
> + }
> +
> + buf[6] = '\0';
> + if (strcmp(buf, "finish")) {
> + ERROR("read \"%s\", expect \"finish\"", buf);
> + return -1;
> + }
> +
> + if (!self->first_time)
> + /* flush queued packets now */
> + ioctl(self->dev_fd, COMP_IOCTFLUSH);
> +
> + /* notify slaver to resume vm*/
> + if (write_exact(fd, "resume", 6) < 0) {
> + PERROR("write: resume");
> + return -1;
> + }
> +
> + return 0;
> +}
> +
> +static int install_fw_network(CheckpointObject *self)
> +{
> + int rc;
> + PyObject* result;
> +
> + PyEval_RestoreThread(self->threadstate);
> + result = PyObject_CallFunction(self->setup_cb, NULL);
> + self->threadstate = PyEval_SaveThread();
> +
> + if (!result)
> + return -1;
> +
> + if (result == Py_None || PyObject_IsTrue(result))
> + rc = 0;
> + else
> + rc = -1;
> +
> + Py_DECREF(result);
> +
> + return rc;
> +}
> +
> +static int wait_slaver_resume(CheckpointObject *self)
> +{
> + int fd = self->cps.fd;
> + xc_interface *xch = self->cps.xch;
> + char buf[7];
> +
> + if (read_exact(fd, buf, 6) < 0) {
> + PERROR("read resume");
> + return -1;
> + }
> +
> + buf[6] = '\0';
> + if (strcmp(buf, "resume")) {
> + ERROR("read \"%s\", expect \"resume\"", buf);
> + return -1;
> + }
> +
> + return 0;
> +}
> +
> +static int colo_postresume(CheckpointObject *self)
> +{
> + int rc;
> + int dev_fd = self->dev_fd;
> +
> + rc = wait_slaver_resume(self);
> + if (rc < 0)
> + return rc;
> +
> + if (self->first_time) {
> + rc = install_fw_network(self);
> + if (rc < 0) {
> + fprintf(stderr, "install network fails\n");
> + return rc;
> + }
> + } else {
> + ioctl(dev_fd, COMP_IOCTRESUME);
> + }
> +
> + return 0;
> +}
> +
> +static int pre_checkpoint(CheckpointObject *self)
> +{
> + xc_interface *xch = self->cps.xch;
> +
> + if (!self->first_time)
> + return 0;
> +
> + self->dev_fd = open("/dev/HA_compare", O_RDWR);
> + if (self->dev_fd < 0) {
> + PERROR("opening /dev/HA_compare fails");
> + return -1;
> + }
> +
> + return 0;
> +}
> +
> +static void wait_new_checkpoint(CheckpointObject *self)
> +{
> + int dev_fd = self->dev_fd;
> + int err;
> +
> + while (1) {
> + err = ioctl(dev_fd, COMP_IOCTWAIT);
> + if (err == 0)
> + break;
> +
> + if (err == -1 && errno != ERESTART && errno != ETIME) {
> + fprintf(stderr, "ioctl() returns -1, errno: %d\n", errno);
> + }
> + }
> +}
> +
> /* private functions */
>
> /* bounce C suspend call into python equivalent.
> @@ -289,6 +498,13 @@ static int suspend_trampoline(void* data)
>
> PyObject* result;
>
> + if (self->colo) {
> + if (notify_slaver_suspend(self) < 0) {
> + fprintf(stderr, "nofitying slaver suspend fails\n");
> + return 0;
> + }
> + }
> +
> /* call default suspend function, then python hook if available */
> if (self->armed) {
> if (checkpoint_wait(&self->cps) < 0) {
> @@ -307,8 +523,16 @@ static int suspend_trampoline(void* data)
> }
> }
>
> + /* suspend_cb() should be called after both sides are suspended */
> + if (self->colo) {
> + if (wait_slaver_suspend(self) < 0) {
> + fprintf(stderr, "waiting slaver suspend fails\n");
> + return 0;
> + }
> + }
> +
> if (!self->suspend_cb)
> - return 1;
> + goto start_checkpoint;
>
> PyEval_RestoreThread(self->threadstate);
> result = PyObject_CallFunction(self->suspend_cb, NULL);
> @@ -319,12 +543,32 @@ static int suspend_trampoline(void* data)
>
> if (result == Py_None || PyObject_IsTrue(result)) {
> Py_DECREF(result);
> - return 1;
> + goto start_checkpoint;
> }
>
> Py_DECREF(result);
>
> return 0;
> +
> +start_checkpoint:
> + if (self->colo) {
> + if (notify_slaver_start_checkpoint(self) < 0) {
> + fprintf(stderr, "nofitying slaver to start checkpoint fails\n");
> + return 0;
> + }
> +
> + /* PVM is suspended first when doing live migration,
> + * and then it is suspended for a new checkpoint.
> + */
> + if (self->first_time == 1)
> + /* live migration */
> + self->first_time = 2;
> + else if (self->first_time == 2)
> + /* the first checkpoint */
> + self->first_time = 0;
> + }
> +
> + return 1;
> }
>
> static int postcopy_trampoline(void* data)
> @@ -334,6 +578,13 @@ static int postcopy_trampoline(void* data)
> PyObject* result;
> int rc = 0;
>
> + if (self->colo) {
> + if (notify_slaver_resume(self) < 0) {
> + fprintf(stderr, "nofitying slaver resume fails\n");
> + return 0;
> + }
> + }
> +
> if (!self->postcopy_cb)
> goto resume;
>
> @@ -352,6 +603,13 @@ static int postcopy_trampoline(void* data)
> return 0;
> }
>
> + if (self->colo) {
> + if (colo_postresume(self) < 0) {
> + fprintf(stderr, "postresume fails\n");
> + return 0;
> + }
> + }
> +
> return rc;
> }
>
> @@ -366,8 +624,15 @@ static int checkpoint_trampoline(void* data)
> return -1;
> }
>
> + if (self->colo) {
> + if (pre_checkpoint(self) < 0) {
> + fprintf(stderr, "pre_checkpoint() fails\n");
> + return -1;
> + }
> + }
> +
> if (!self->checkpoint_cb)
> - return 0;
> + goto wait_checkpoint;
>
> PyEval_RestoreThread(self->threadstate);
> result = PyObject_CallFunction(self->checkpoint_cb, NULL);
> @@ -378,10 +643,39 @@ static int checkpoint_trampoline(void* data)
>
> if (result == Py_None || PyObject_IsTrue(result)) {
> Py_DECREF(result);
> - return 1;
> + goto wait_checkpoint;
> }
>
> Py_DECREF(result);
>
> return 0;
> +
> +wait_checkpoint:
> + if (self->colo) {
> + wait_new_checkpoint(self);
> + }
> +
> + fprintf(stderr, "\n\nnew checkpoint..........\n");
> +
> + return 1;
> +}
> +
> +static int post_sendstate_trampoline(void* data)
> +{
> + CheckpointObject *self = data;
> + int fd = self->cps.fd;
> + int i = XC_SAVE_ID_LAST_CHECKPOINT;
> +
> + if (!self->colo)
> + return 0;
> +
> + /* In colo mode, guest is running on slaver side, so we should
> + * send XC_SAVE_ID_LAST_CHECKPOINT to slaver.
> + */
> + if (write_exact(fd, &i, sizeof(int)) < 0) {
> + fprintf(stderr, "writing XC_SAVE_ID_LAST_CHECKPOINT fails\n");
> + return -1;
> + }
> +
> + return 0;
> }
> diff --git a/tools/python/xen/lowlevel/checkpoint/checkpoint.h b/tools/python/xen/lowlevel/checkpoint/checkpoint.h
> index 187d9d7..96fc949 100644
> --- a/tools/python/xen/lowlevel/checkpoint/checkpoint.h
> +++ b/tools/python/xen/lowlevel/checkpoint/checkpoint.h
> @@ -41,6 +41,7 @@ typedef struct {
> } checkpoint_state;
>
> #define CHECKPOINT_FLAGS_COMPRESSION 1
> +#define CHECKPOINT_FLAGS_COLO 2
> char* checkpoint_error(checkpoint_state* s);
>
> void checkpoint_init(checkpoint_state* s);
next prev parent reply other threads:[~2013-07-11 13:52 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-11 8:35 [RFC Patch v2 00/16] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wen Congyang
2013-07-11 8:35 ` [RFC Patch v2 01/16] xen: introduce new hypercall to reset vcpu Wen Congyang
2013-07-11 9:44 ` Andrew Cooper
2013-07-11 9:58 ` Wen Congyang
2013-07-11 10:01 ` Ian Campbell
2013-08-01 11:48 ` Tim Deegan
2013-08-06 6:47 ` Wen Congyang
2013-07-11 8:35 ` [RFC Patch v2 02/16] block-remus: introduce colo mode Wen Congyang
2013-07-11 8:35 ` [RFC Patch v2 03/16] block-remus: introduce a interface to allow the user specify which mode the backup end uses Wen Congyang
2013-07-11 8:35 ` [RFC Patch v2 04/16] dominfo.completeRestore() will be called more than once in colo mode Wen Congyang
2013-07-11 8:35 ` [RFC Patch v2 05/16] xc_domain_restore: introduce restore_callbacks for colo Wen Congyang
2013-07-11 8:35 ` [RFC Patch v2 06/16] colo: implement restore_callbacks init()/free() Wen Congyang
2013-07-11 8:35 ` [RFC Patch v2 07/16] colo: implement restore_callbacks get_page() Wen Congyang
2013-07-11 8:35 ` [RFC Patch v2 08/16] colo: implement restore_callbacks flush_memory Wen Congyang
2013-07-11 8:35 ` [RFC Patch v2 09/16] colo: implement restore_callbacks update_p2m() Wen Congyang
2013-07-11 8:35 ` [RFC Patch v2 10/16] colo: implement restore_callbacks finish_restore() Wen Congyang
2013-07-11 9:40 ` Ian Campbell
2013-07-11 9:54 ` Wen Congyang
2013-07-11 8:35 ` [RFC Patch v2 11/16] xc_restore: implement for colo Wen Congyang
2013-07-11 8:35 ` [RFC Patch v2 12/16] XendCheckpoint: implement colo Wen Congyang
2013-07-11 8:35 ` [RFC Patch v2 13/16] xc_domain_save: flush cache before calling callbacks->postcopy() Wen Congyang
2013-07-11 13:43 ` Andrew Cooper
2013-07-12 1:36 ` Wen Congyang
2013-07-11 8:35 ` [RFC Patch v2 14/16] add callback to configure network for colo Wen Congyang
2013-07-11 8:35 ` [RFC Patch v2 15/16] xc_domain_save: implement save_callbacks " Wen Congyang
2013-07-11 13:52 ` Andrew Cooper [this message]
2013-07-11 8:35 ` [RFC Patch v2 16/16] remus: implement colo mode Wen Congyang
2013-07-11 9:37 ` [RFC Patch v2 00/16] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Andrew Cooper
2013-07-11 9:40 ` Ian Campbell
2013-07-14 14:33 ` Shriram Rajagopalan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51DEB8AE.6020102@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=bobby.hong@huawei.com \
--cc=eddie.dong@intel.com \
--cc=laijs@cn.fujitsu.com \
--cc=rshriram@cs.ubc.ca \
--cc=wei.ye1987@gmail.com \
--cc=wency@cn.fujitsu.com \
--cc=xen-devel@lists.xen.org \
--cc=xuyao.xu@huawei.com \
--cc=yunhong.jiang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.