All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wen Congyang <wency@cn.fujitsu.com>
To: xen devel <xen-devel@lists.xen.org>
Cc: Ian Campbell <Ian.Campbell@citrix.com>,
	Wen congyang <wency@cn.fujitsu.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	Jiang Yunhong <yunhong.jiang@intel.com>,
	Dong Eddie <eddie.dong@intel.com>,
	Yang Hongyang <yanghy@cn.fujitsu.com>,
	Lai Jiangshan <laijs@cn.fujitsu.com>
Subject: Re: [RFC Patch 00/25] COarse-grain LOck-stepping Virtual	Machines for Non-stop Service
Date: Fri, 18 Jul 2014 19:43:54 +0800	[thread overview]
Message-ID: <53C9087A.40609@cn.fujitsu.com> (raw)
In-Reply-To: <1405683551-12579-1-git-send-email-wency@cn.fujitsu.com>

At 07/18/2014 07:38 PM, Wen Congyang Wrote:
> Virtual machine (VM) replication is a well known technique for providing
> application-agnostic software-implemented hardware fault tolerance -
> "non-stop service". Currently, remus provides this function, but it buffers
> all output packets, and the latency is unacceptable.
> 
> In xen summit 2012, We introduce a new VM replication solution: colo
> (COarse-grain LOck-stepping virtual machine). The presentation is in
> the following URL:
> http://www.slideshare.net/xen_com_mgr/colo-coarsegrain-lockstepping-virtual-machines-for-nonstop-service
> 
> Here is the summary of the solution:
>>From the client's point of view, as long as the client observes identical
> responses from the primary and secondary VMs, according to the service
> semantics, then the secondary vm is a valid replica of the primary
> vm, and can successfully take over when a hardware failure of the
> primary vm is detected.
> 
> This patchset is RFC, and implements the frame of colo:
> 1. Both primary vm and secondary vm are running
> 2. do checkoint
> 
> This patchset is based on remus-v15, and use migration v1. Only supports hvm
> guest now.
> 
> TODO list:
> 1. rebase to remus-v17 or newer
> 2. support migration v2
> 3. nic/disk replication
> 4. support pvm
> 
> Patch 1-3: bugfix
> Patch 4-6: temporarily update remus to reuse remus device codes
> Patch 7-14: update some APIs which will be used by colo
> Patch 15-22: colo related codes
> Patch 23: Hack patch, just for test
> Patch 24-25: bugfix. We find this bug before rebasing colo to newest xen.
>           But we don't trigger this bug now.
> Patch 26: A patch for qemu-xen

I also put the codes in github:
https://github.com/wencongyang/xen/tree/colo

> 
> Hong Tao (1):
>   copy the correct page to memory
> 
> Wen Congyang (24):
>   csum the correct page
>   don't zero out ioreq page
>   don't touch remus in remus_device
>   rename remus device to checkpoint device
>   adjust the indentation
>   Refactor domain_suspend_callback_common()
>   Update libxl__domain_resume() for colo
>   Update libxl__domain_suspend_common_switch_qemu_logdirty() for colo
>   Introduce a new internal API libxl__domain_unpause()
>   Update libxl__domain_unpause() to support qemu-xen
>   support to resume uncooperative HVM guests
>   update datecopier to support sending data only
>   introduce a new API to aync read data from fd
>   Update libxl_save_msgs_gen.pl to support return data from xl to xc
>   Allow slave sends data to master
>   secondary vm suspend/resume/checkpoint code
>   primary vm suspend/get_dirty_pfn/resume/checkpoint code
>   xc_domain_save: flush cache before calling callbacks->postcopy() in
>     colo mode
>   COLO: xc related codes
>   send store mfn and console mfn to xl before resuming secondary vm
>   implement the cmdline for COLO
>   HACK: do checkpoint per 20ms
>   fix vm entry fail
>   sync mmu before resuming secondary vm
> 
>  docs/man/xl.pod.1                                  |   9 +-
>  tools/libxc/xc_domain.c                            |   9 +
>  tools/libxc/xc_domain_restore.c                    |  74 +-
>  tools/libxc/xc_domain_save.c                       |  66 +-
>  tools/libxc/xc_resume.c                            |  20 +-
>  tools/libxc/xenctrl.h                              |   2 +
>  tools/libxc/xenguest.h                             |  40 +
>  tools/libxl/Makefile                               |   3 +-
>  tools/libxl/libxl.c                                | 102 ++-
>  tools/libxl/libxl.h                                |   3 +-
>  tools/libxl/libxl_aoutils.c                        |  81 +-
>  ...xl_remus_device.c => libxl_checkpoint_device.c} | 266 ++++---
>  tools/libxl/libxl_colo.h                           |  48 ++
>  tools/libxl/libxl_colo_restore.c                   | 882 +++++++++++++++++++++
>  tools/libxl/libxl_colo_save.c                      | 602 ++++++++++++++
>  tools/libxl/libxl_create.c                         | 131 ++-
>  tools/libxl/libxl_dom.c                            | 424 ++++++----
>  tools/libxl/libxl_internal.h                       | 262 ++++--
>  tools/libxl/libxl_netbuffer.c                      |  85 +-
>  tools/libxl/libxl_nonetbuffer.c                    |  14 +-
>  tools/libxl/libxl_qmp.c                            |  10 +
>  tools/libxl/libxl_remus_disk_drbd.c                |  54 +-
>  tools/libxl/libxl_save_callout.c                   |  37 +-
>  tools/libxl/libxl_save_helper.c                    |  17 +
>  tools/libxl/libxl_save_msgs_gen.pl                 |  74 +-
>  tools/libxl/libxl_types.idl                        |  12 +-
>  tools/libxl/xl_cmdimpl.c                           |  54 +-
>  tools/libxl/xl_cmdtable.c                          |   3 +-
>  xen/arch/x86/domctl.c                              |  15 +
>  xen/arch/x86/hvm/save.c                            |   6 +
>  xen/arch/x86/hvm/vmx/vmcs.c                        |   8 +
>  xen/arch/x86/hvm/vmx/vmx.c                         |   8 +
>  xen/include/asm-x86/hvm/hvm.h                      |   1 +
>  xen/include/asm-x86/hvm/vmx/vmcs.h                 |   1 +
>  xen/include/public/domctl.h                        |   1 +
>  xen/include/xen/hvm/save.h                         |   2 +
>  36 files changed, 2895 insertions(+), 531 deletions(-)
>  rename tools/libxl/{libxl_remus_device.c => libxl_checkpoint_device.c} (47%)
>  create mode 100644 tools/libxl/libxl_colo.h
>  create mode 100644 tools/libxl/libxl_colo_restore.c
>  create mode 100644 tools/libxl/libxl_colo_save.c
> 

  parent reply	other threads:[~2014-07-18 11:43 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-18 11:38 [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wen Congyang
2014-07-18 11:38 ` [RFC Patch 01/25] copy the correct page to memory Wen Congyang
2014-07-18 11:38 ` [RFC Patch 02/25] csum the correct page Wen Congyang
2014-07-18 11:38 ` [RFC Patch 03/25] don't zero out ioreq page Wen Congyang
2014-07-18 11:38 ` [RFC Patch 04/25] don't touch remus in remus_device Wen Congyang
2014-07-18 11:38 ` [RFC Patch 05/25] rename remus device to checkpoint device Wen Congyang
2014-07-18 11:38 ` [RFC Patch 06/25] adjust the indentation Wen Congyang
2014-07-18 11:38 ` [RFC Patch 07/25] Refactor domain_suspend_callback_common() Wen Congyang
2014-07-18 11:38 ` [RFC Patch 08/25] Update libxl__domain_resume() for colo Wen Congyang
2014-07-18 11:38 ` [RFC Patch 09/25] Update libxl__domain_suspend_common_switch_qemu_logdirty() " Wen Congyang
2014-07-18 11:38 ` [RFC Patch 10/25] Introduce a new internal API libxl__domain_unpause() Wen Congyang
2014-07-18 11:38 ` [RFC Patch 11/25] Update libxl__domain_unpause() to support qemu-xen Wen Congyang
2014-07-18 11:38 ` [RFC Patch 12/25] support to resume uncooperative HVM guests Wen Congyang
2014-07-18 11:38 ` [RFC Patch 13/25] update datecopier to support sending data only Wen Congyang
2014-07-18 11:38 ` [RFC Patch 14/25] introduce a new API to aync read data from fd Wen Congyang
2014-07-18 11:39 ` [RFC Patch 15/25] Update libxl_save_msgs_gen.pl to support return data from xl to xc Wen Congyang
2014-07-18 11:39 ` [RFC Patch 16/25] Allow slave sends data to master Wen Congyang
2014-07-18 11:39 ` [RFC Patch 17/25] secondary vm suspend/resume/checkpoint code Wen Congyang
2014-07-18 11:39 ` [RFC Patch 18/25] primary vm suspend/get_dirty_pfn/resume/checkpoint code Wen Congyang
2014-07-18 11:39 ` [RFC Patch 19/25] xc_domain_save: flush cache before calling callbacks->postcopy() in colo mode Wen Congyang
2014-07-18 11:39 ` [RFC Patch 20/25] COLO: xc related codes Wen Congyang
2014-07-18 11:39 ` [RFC Patch 21/25] send store mfn and console mfn to xl before resuming secondary vm Wen Congyang
2014-07-18 11:39 ` [RFC Patch 22/25] implement the cmdline for COLO Wen Congyang
2014-07-18 11:39 ` [RFC Patch 23/25] HACK: do checkpoint per 20ms Wen Congyang
2014-07-18 11:39 ` [RFC Patch 24/25] fix vm entry fail Wen Congyang
2014-07-24 10:40   ` Tim Deegan
2014-07-25  5:39     ` Wen Congyang
2014-08-07  6:52     ` Wen Congyang
2014-07-18 11:39 ` [RFC Patch 25/25] sync mmu before resuming secondary vm Wen Congyang
2014-07-24 10:59   ` Tim Deegan
2014-07-25  5:46     ` Wen Congyang
2014-08-07  7:46     ` Wen Congyang
2014-07-18 11:39 ` [RFC Patch 26/25] Introduce "xen-load-devices-state" Wen Congyang
2014-07-18 11:43 ` Wen Congyang [this message]
2014-07-18 14:18 ` [RFC Patch 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Andrew Cooper
2014-07-18 14:30   ` Wen Congyang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53C9087A.40609@cn.fujitsu.com \
    --to=wency@cn.fujitsu.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=eddie.dong@intel.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=xen-devel@lists.xen.org \
    --cc=yanghy@cn.fujitsu.com \
    --cc=yunhong.jiang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.