All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH COLO v5 00/29] COarse-grain LOck-stepping Virtual Machines for Non-stop Service
@ 2015-04-01  6:41 Yang Hongyang
  2015-04-01  6:41 ` [RFC PATCH COLO v5 01/29] Add readme Yang Hongyang
                   ` (28 more replies)
  0 siblings, 29 replies; 47+ messages in thread
From: Yang Hongyang @ 2015-04-01  6:41 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, ian.campbell, wency, ian.jackson, yunhong.jiang,
	eddie.dong, rshriram, yanghy

This patchset is for xen-4.6. The main diffrence from previous versions are:
1. Use qdisk block replication
   http://wiki.qemu.org/Features/BlockReplication
2. Nic replication based on colo-proxy
   http://wiki.qemu.org/Features/COLO#Components
Note that COLO feature is under active development, this version is not well
tested and has some known problems.
We post this early in order to give you a brief impression about how COLO
will be implemented and we request for your comments about the general idea
of COLO and of course the implementation, if you have any idea/suggestion
on COLO, please do not hesitate to give your comments, thanks in advance.

Virtual machine (VM) replication is a well known technique for providing
application-agnostic software-implemented hardware fault tolerance -
"non-stop service". Currently, remus provides this function, but it buffers
all output packets, and the latency is unacceptable.
In xen summit 2012, We introduce a new VM replication solution: colo
(COarse-grain LOck-stepping virtual machine). The presentation is in
the following URL:
http://www.slideshare.net/xen_com_mgr/colo-coarsegrain-lockstepping-virtual-machines-for-nonstop-service

Here is the summary of the solution:
>From the client's point of view, as long as the client observes identical
responses from the primary and secondary VMs, according to the service
semantics, then the secondary vm is a valid replica of the primary
vm, and can successfully take over when a hardware failure of the
primary vm is detected.

This patchset is based on migration v1.
Only supports hvm guest now. The codes are also hosted on github:
https://github.com/macrosheep/xen/tree/COLO_RFC_v5

TODO list:
1. Code reviews and Bug fixes
2. Switch to migration v2
3. Support pvm

Known bugs:
1. Secondary vm may crash due to triple fault.

Wiki pages:
http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
http://wiki.qemu.org/Features/COLO

Patch 1    : Add readme
Patch 2-8  : Some refactor and prepare work
Patch 9-12 : Update remus to reuse remus device codes
Patch 13-21: COLO framework related codes
Patch 22-23: implement disk replication
Patch 24-29: implement nic replication

Changelog from v4 to v5:
1. rebase to the latest xen upstream
2. disk replication: blktap2->qdisk
3. nic replication: colo-agent->colo-proxy

Changelog from v3 to v4:
1. rebase to newest xen
2. bug fix

Changlog from v2 to v3:
1. rebase to newest remus
2. add nic replication support

Changlog from v1 to v2:
1. rebase to newest remus
2. add disk replication support

Wen Congyang (23):
  Add readme
  Refactor domain_suspend_callback_common()
  tools: libxl: introduce a new API libxl__domain_restore() to read qemu
    state
  Update libxl__domain_suspend_common_switch_qemu_logdirty() for colo
  Introduce a new internal API libxl__domain_unpause()
  Update libxl__domain_unpause() to support qemu-xen
  support to resume uncooperative HVM guests
  tools/libxl: Introduce bitops macros
  move remus related codes to libxl_remus.c
  rename remus device to checkpoint device
  adjust the indentation
  don't touch remus in checkpoint_device
  Update libxl_save_msgs_gen.pl to support return data from xl to xc
  Allow slave sends data to master
  secondary vm suspend/resume/checkpoint code
  primary vm suspend/get_dirty_pfn/resume/checkpoint code
  xc_domain_save: flush cache before calling callbacks->postcopy() in
    colo mode
  COLO: xc related codes
  send store mfn and console mfn to xl before resuming secondary vm
  implement the cmdline for COLO
  tools: xc_doamin_restore: zero ioreq page only one time
  Support colo mode for qemu disk
  COLO: use qemu block replication

Yang Hongyang (6):
  COLO proxy: implement setup/teardown of COLO proxy module
  COLO proxy: preresume, postresume and checkpoint
  COLO nic: implement COLO nic subkind
  setup and control colo proxy on primary side
  setup and control colo proxy on secondary side
  cmdline switches and config vars to control colo-proxy

 docs/README.colo                      |   92 +++
 docs/man/xl.conf.pod.5                |    6 +
 docs/man/xl.pod.1                     |   11 +-
 tools/hotplug/Linux/Makefile          |    1 +
 tools/hotplug/Linux/colo-proxy-setup  |  128 ++++
 tools/libxc/include/xenguest.h        |   40 ++
 tools/libxc/xc_domain_restore.c       |  106 ++-
 tools/libxc/xc_domain_save.c          |   71 +-
 tools/libxc/xc_resume.c               |   20 +-
 tools/libxl/Makefile                  |    6 +-
 tools/libxl/libxl.c                   |  185 +++--
 tools/libxl/libxl_bitops.h            |   79 +++
 tools/libxl/libxl_checkpoint_device.c |  282 ++++++++
 tools/libxl/libxl_colo.h              |   53 ++
 tools/libxl/libxl_colo_nic.c          |  313 +++++++++
 tools/libxl/libxl_colo_proxy.c        |  267 ++++++++
 tools/libxl/libxl_colo_qdisk.c        |  209 ++++++
 tools/libxl/libxl_colo_restore.c      | 1190 +++++++++++++++++++++++++++++++++
 tools/libxl/libxl_colo_save.c         |  782 ++++++++++++++++++++++
 tools/libxl/libxl_create.c            |  166 ++++-
 tools/libxl/libxl_device.c            |   38 ++
 tools/libxl/libxl_dm.c                |  262 +++++++-
 tools/libxl/libxl_dom.c               |  569 +++++++---------
 tools/libxl/libxl_internal.h          |  302 ++++++---
 tools/libxl/libxl_netbuffer.c         |  117 ++--
 tools/libxl/libxl_nonetbuffer.c       |   10 +-
 tools/libxl/libxl_qmp.c               |   41 ++
 tools/libxl/libxl_remus.c             |  373 +++++++++++
 tools/libxl/libxl_remus.h             |   27 +
 tools/libxl/libxl_remus_device.c      |  327 ---------
 tools/libxl/libxl_remus_disk_drbd.c   |   57 +-
 tools/libxl/libxl_save_callout.c      |   37 +-
 tools/libxl/libxl_save_helper.c       |   17 +
 tools/libxl/libxl_save_msgs_gen.pl    |   74 +-
 tools/libxl/libxl_types.idl           |   20 +-
 tools/libxl/libxlu_disk_l.l           |    5 +
 tools/libxl/xl.c                      |    3 +
 tools/libxl/xl.h                      |    1 +
 tools/libxl/xl_cmdimpl.c              |  101 ++-
 tools/libxl/xl_cmdtable.c             |    4 +-
 40 files changed, 5413 insertions(+), 979 deletions(-)
 create mode 100644 docs/README.colo
 create mode 100755 tools/hotplug/Linux/colo-proxy-setup
 create mode 100644 tools/libxl/libxl_bitops.h
 create mode 100644 tools/libxl/libxl_checkpoint_device.c
 create mode 100644 tools/libxl/libxl_colo.h
 create mode 100644 tools/libxl/libxl_colo_nic.c
 create mode 100644 tools/libxl/libxl_colo_proxy.c
 create mode 100644 tools/libxl/libxl_colo_qdisk.c
 create mode 100644 tools/libxl/libxl_colo_restore.c
 create mode 100644 tools/libxl/libxl_colo_save.c
 create mode 100644 tools/libxl/libxl_remus.c
 create mode 100644 tools/libxl/libxl_remus.h
 delete mode 100644 tools/libxl/libxl_remus_device.c

-- 
1.9.1

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2015-04-23 12:09 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-01  6:41 [RFC PATCH COLO v5 00/29] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
2015-04-01  6:41 ` [RFC PATCH COLO v5 01/29] Add readme Yang Hongyang
2015-04-08 18:11   ` Wei Liu
2015-04-14  4:06     ` Hongyang Yang
2015-04-01  6:41 ` [RFC PATCH COLO v5 02/29] Refactor domain_suspend_callback_common() Yang Hongyang
2015-04-08 18:11   ` Wei Liu
2015-04-14  5:56     ` Wen Congyang
2015-04-22 14:45   ` Ian Campbell
2015-04-01  6:41 ` [RFC PATCH COLO v5 03/29] tools: libxl: introduce a new API libxl__domain_restore() to read qemu state Yang Hongyang
2015-04-08 18:11   ` Wei Liu
2015-04-15 13:19     ` Ian Jackson
2015-04-01  6:41 ` [RFC PATCH COLO v5 04/29] Update libxl__domain_suspend_common_switch_qemu_logdirty() for colo Yang Hongyang
2015-04-08 18:12   ` Wei Liu
2015-04-01  6:41 ` [RFC PATCH COLO v5 05/29] Introduce a new internal API libxl__domain_unpause() Yang Hongyang
2015-04-08 18:12   ` Wei Liu
2015-04-01  6:41 ` [RFC PATCH COLO v5 06/29] Update libxl__domain_unpause() to support qemu-xen Yang Hongyang
2015-04-08 18:12   ` Wei Liu
2015-04-01  6:41 ` [RFC PATCH COLO v5 07/29] support to resume uncooperative HVM guests Yang Hongyang
2015-04-08 18:12   ` Wei Liu
2015-04-23 12:09     ` Wen Congyang
2015-04-22 14:54   ` Ian Campbell
2015-04-23 12:08     ` Wen Congyang
2015-04-01  6:41 ` [RFC PATCH COLO v5 08/29] tools/libxl: Introduce bitops macros Yang Hongyang
2015-04-22 15:10   ` Ian Campbell
2015-04-23 11:56     ` Wen Congyang
2015-04-01  6:41 ` [RFC PATCH COLO v5 09/29] move remus related codes to libxl_remus.c Yang Hongyang
2015-04-01  6:41 ` [RFC PATCH COLO v5 10/29] rename remus device to checkpoint device Yang Hongyang
2015-04-01  6:41 ` [RFC PATCH COLO v5 11/29] adjust the indentation Yang Hongyang
2015-04-22 15:20   ` Ian Campbell
2015-04-01  6:41 ` [RFC PATCH COLO v5 12/29] don't touch remus in checkpoint_device Yang Hongyang
2015-04-01  6:41 ` [RFC PATCH COLO v5 13/29] Update libxl_save_msgs_gen.pl to support return data from xl to xc Yang Hongyang
2015-04-01  6:41 ` [RFC PATCH COLO v5 14/29] Allow slave sends data to master Yang Hongyang
2015-04-01  6:41 ` [RFC PATCH COLO v5 15/29] secondary vm suspend/resume/checkpoint code Yang Hongyang
2015-04-01  6:41 ` [RFC PATCH COLO v5 16/29] primary vm suspend/get_dirty_pfn/resume/checkpoint code Yang Hongyang
2015-04-01  6:41 ` [RFC PATCH COLO v5 17/29] xc_domain_save: flush cache before calling callbacks->postcopy() in colo mode Yang Hongyang
2015-04-01  6:41 ` [RFC PATCH COLO v5 18/29] COLO: xc related codes Yang Hongyang
2015-04-01  6:41 ` [RFC PATCH COLO v5 19/29] send store mfn and console mfn to xl before resuming secondary vm Yang Hongyang
2015-04-01  6:41 ` [RFC PATCH COLO v5 20/29] implement the cmdline for COLO Yang Hongyang
2015-04-01  6:41 ` [RFC PATCH COLO v5 21/29] tools: xc_doamin_restore: zero ioreq page only one time Yang Hongyang
2015-04-01  6:55 ` [RFC PATCH COLO v5 22/29] Support colo mode for qemu disk Yang Hongyang
2015-04-01  6:57 ` [RFC PATCH COLO v5 23/29] COLO: use qemu block replication Yang Hongyang
2015-04-01  6:57 ` [RFC PATCH COLO v5 24/29] COLO proxy: implement setup/teardown of COLO proxy module Yang Hongyang
2015-04-01  6:57 ` [RFC PATCH COLO v5 25/29] COLO proxy: preresume, postresume and checkpoint Yang Hongyang
2015-04-01  6:57 ` [RFC PATCH COLO v5 26/29] COLO nic: implement COLO nic subkind Yang Hongyang
2015-04-01  6:58 ` [RFC PATCH COLO v5 27/29] setup and control colo proxy on primary side Yang Hongyang
2015-04-01  6:58 ` [RFC PATCH COLO v5 28/29] setup and control colo proxy on secondary side Yang Hongyang
2015-04-01  6:58 ` [RFC PATCH COLO v5 29/29] cmdline switches and config vars to control colo-proxy Yang Hongyang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.