All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC Patch v4 00/18] COarse-grain LOck-stepping Virtual Machines for Non-stop Service
@ 2014-10-24  7:05 Wen Congyang
  2014-10-24  7:05 ` [RFC Patch v4 01/18] move remus related codes to libxl_remus.c Wen Congyang
                   ` (19 more replies)
  0 siblings, 20 replies; 27+ messages in thread
From: Wen Congyang @ 2014-10-24  7:05 UTC (permalink / raw)
  To: xen devel
  Cc: Ian Campbell, Wen Congyang, Ian Jackson, Jiang Yunhong,
	Dong Eddie, Yang Hongyang, Lai Jiangshan

This patchset is not for xen-4.5, so I will stop to update it until xen 4.5 is
released.

Virtual machine (VM) replication is a well known technique for providing
application-agnostic software-implemented hardware fault tolerance -
"non-stop service". Currently, remus provides this function, but it buffers
all output packets, and the latency is unacceptable.

In xen summit 2012, We introduce a new VM replication solution: colo
(COarse-grain LOck-stepping virtual machine). The presentation is in
the following URL:
http://www.slideshare.net/xen_com_mgr/colo-coarsegrain-lockstepping-virtual-machines-for-nonstop-service

Here is the summary of the solution:
>From the client's point of view, as long as the client observes identical
responses from the primary and secondary VMs, according to the service
semantics, then the secondary vm is a valid replica of the primary
vm, and can successfully take over when a hardware failure of the
primary vm is detected.

This patchset is RFC, and implements the framework and disk replication of COLO:
1. Both primary vm and secondary vm are running
2. do checkoint
3. disk replication(use blktap2)
3. nic replication(use colo-agent)

This patchset is based on bugfix, and colo-prepare patchset, and use migration v1.
Only supports hvm guest now. The codes are also hosted on github:
https://github.com/wencongyang/xen/tree/colo-v4

TODO list:
1. Use migration v2 to implement COLO
2. support pvm

Known bugs:
1. Secondary vm may crash due to triple fault.

Usage:
1. update the vm's configfile:
   disk:
        disk = [ 'format=raw,devtype=disk,access=w,vdev=hda,backendtype=tap,filter=colo,filter-params=192.168.3.1:9000,target=/root/images/hvm/hvm_nopv/hvm.img' ]
   nic:
        vif = [ 'mac=00:16:4f:00:00:11, bridge=br0, model=e1000, forwarddev=eth0' ]
   Note: the forwarddev of primary and secondary host should be connected directly,
   and no other app uses it. If you don't have such nic, you can use vlan to make it.
2. build colo-agent:
   You can get colo-agent from github:
   https://github.com/wencongyang/colo-agent
3. run:
   xl remus -c -u <domname> <secondary host IP>

Patch 1-4  : update remus to reuse remus device codes
Patch 5-13 : COLO framework related codes
Patch 14   : implement block-colo
Patch 15   : implement disk replication
Patch 16-18: implement nic replication
Patch 19   : A patch for qemu-xen

Change from v3 to v4:
1. rebase to newest xen
2. bug fix

Changlog from v2 to v3:
1. rebase to newest remus
2. add nic replication support

Changlog from v1 to v2:
1. rebase to newest remus
2. add disk replication support

Wen Congyang (18):
  move remus related codes to libxl_remus.c
  rename remus device to checkpoint device
  adjust the indentation
  don't touch remus in checkpoint_device
  Update libxl_save_msgs_gen.pl to support return data from xl to xc
  Allow slave sends data to master
  secondary vm suspend/resume/checkpoint code
  primary vm suspend/get_dirty_pfn/resume/checkpoint code
  xc_domain_save: flush cache before calling callbacks->postcopy() in
    colo mode
  COLO: xc related codes
  send store mfn and console mfn to xl before resuming secondary vm
  implement the cmdline for COLO
  tools: xc_doamin_restore: zero ioreq page only one time
  block-colo: implement colo disk replication
  libxl/colo: setup and control disk replication for blktap2 backends
  setup and control colo-agent for primary vm
  setup and control colo-agent for secondary vm
  colo: cmdline switches and config vars to control colo-agent

 .gitignore                                         |    1 +
 docs/man/xl.conf.pod.5                             |    6 +
 docs/man/xl.pod.1                                  |   12 +-
 tools/blktap2/drivers/Makefile                     |    3 +
 tools/blktap2/drivers/block-colo.c                 | 1132 ++++++++++++++++++++
 tools/blktap2/drivers/block-remus.c                |    4 +-
 tools/blktap2/drivers/block-replication.c          |  262 ++++-
 tools/blktap2/drivers/block-replication.h          |   77 +-
 tools/blktap2/drivers/tapdisk-disktype.c           |    9 +
 tools/blktap2/drivers/tapdisk-disktype.h           |    3 +-
 tools/hotplug/Linux/Makefile                       |    2 +
 tools/hotplug/Linux/colo-agent-setup               |  210 ++++
 tools/hotplug/Linux/remus-netbuf-setup             |   45 +-
 tools/hotplug/Linux/xen-network-ft.sh              |  102 ++
 tools/libxc/include/xenguest.h                     |   40 +
 tools/libxc/xc_domain_restore.c                    |  106 +-
 tools/libxc/xc_domain_save.c                       |   63 +-
 tools/libxl/Makefile                               |   11 +-
 tools/libxl/colo-tc.c                              |  589 ++++++++++
 tools/libxl/libxl.c                                |   85 +-
 ...xl_remus_device.c => libxl_checkpoint_device.c} |  233 ++--
 tools/libxl/libxl_colo.h                           |   48 +
 tools/libxl/libxl_colo_nic.c                       |  312 ++++++
 tools/libxl/libxl_colo_restore.c                   | 1003 +++++++++++++++++
 tools/libxl/libxl_colo_save.c                      |  809 ++++++++++++++
 tools/libxl/libxl_colo_save_disk_blktap2.c         |  219 ++++
 tools/libxl/libxl_create.c                         |  152 ++-
 tools/libxl/libxl_dom.c                            |  234 +---
 tools/libxl/libxl_internal.h                       |  218 ++--
 tools/libxl/libxl_netbuffer.c                      |  117 +-
 tools/libxl/libxl_noblktap2.c                      |   29 +
 tools/libxl/libxl_nonetbuffer.c                    |   10 +-
 tools/libxl/libxl_remus.c                          |  373 +++++++
 tools/libxl/libxl_remus.h                          |   27 +
 tools/libxl/libxl_remus_disk_drbd.c                |   57 +-
 tools/libxl/libxl_save_callout.c                   |   37 +-
 tools/libxl/libxl_save_helper.c                    |   17 +
 tools/libxl/libxl_save_msgs_gen.pl                 |   74 +-
 tools/libxl/libxl_types.idl                        |   14 +-
 tools/libxl/xl.c                                   |    3 +
 tools/libxl/xl.h                                   |    1 +
 tools/libxl/xl_cmdimpl.c                           |  100 +-
 tools/libxl/xl_cmdtable.c                          |    4 +-
 43 files changed, 6164 insertions(+), 689 deletions(-)
 create mode 100644 tools/blktap2/drivers/block-colo.c
 create mode 100755 tools/hotplug/Linux/colo-agent-setup
 create mode 100644 tools/hotplug/Linux/xen-network-ft.sh
 create mode 100644 tools/libxl/colo-tc.c
 rename tools/libxl/{libxl_remus_device.c => libxl_checkpoint_device.c} (47%)
 create mode 100644 tools/libxl/libxl_colo.h
 create mode 100644 tools/libxl/libxl_colo_nic.c
 create mode 100644 tools/libxl/libxl_colo_restore.c
 create mode 100644 tools/libxl/libxl_colo_save.c
 create mode 100644 tools/libxl/libxl_colo_save_disk_blktap2.c
 create mode 100644 tools/libxl/libxl_remus.c
 create mode 100644 tools/libxl/libxl_remus.h

-- 
1.9.3

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2014-10-27  1:26 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-24  7:05 [RFC Patch v4 00/18] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Wen Congyang
2014-10-24  7:05 ` [RFC Patch v4 01/18] move remus related codes to libxl_remus.c Wen Congyang
2014-10-24  7:06 ` [RFC Patch v4 02/18] rename remus device to checkpoint device Wen Congyang
2014-10-24  7:06 ` [RFC Patch v4 03/18] adjust the indentation Wen Congyang
2014-10-24  7:06 ` [RFC Patch v4 04/18] don't touch remus in checkpoint_device Wen Congyang
2014-10-24  7:06 ` [RFC Patch v4 05/18] Update libxl_save_msgs_gen.pl to support return data from xl to xc Wen Congyang
2014-10-24  7:06 ` [RFC Patch v4 06/18] Allow slave sends data to master Wen Congyang
2014-10-24  7:06 ` [RFC Patch v4 07/18] secondary vm suspend/resume/checkpoint code Wen Congyang
2014-10-24  7:06 ` [RFC Patch v4 08/18] primary vm suspend/get_dirty_pfn/resume/checkpoint code Wen Congyang
2014-10-24  7:06 ` [RFC Patch v4 09/18] xc_domain_save: flush cache before calling callbacks->postcopy() in colo mode Wen Congyang
2014-10-24  7:06 ` [RFC Patch v4 10/18] COLO: xc related codes Wen Congyang
2014-10-24  7:06 ` [RFC Patch v4 11/18] send store mfn and console mfn to xl before resuming secondary vm Wen Congyang
2014-10-24  7:06 ` [RFC Patch v4 12/18] implement the cmdline for COLO Wen Congyang
2014-10-24  7:06 ` [RFC Patch v4 13/18] tools: xc_doamin_restore: zero ioreq page only one time Wen Congyang
2014-10-24  7:06 ` [RFC Patch v4 14/18] block-colo: implement colo disk replication Wen Congyang
2014-10-24  7:06 ` [RFC Patch v4 15/18] libxl/colo: setup and control disk replication for blktap2 backends Wen Congyang
2014-10-24  7:06 ` [RFC Patch v4 16/18] setup and control colo-agent for primary vm Wen Congyang
2014-10-24  7:06 ` [RFC Patch v4 17/18] setup and control colo-agent for secondary vm Wen Congyang
2014-10-24  7:06 ` [RFC Patch v4 18/18] colo: cmdline switches and config vars to control colo-agent Wen Congyang
2014-10-24  7:06 ` [Qemu-devel] [PATCH 19/18] Introduce "xen-load-devices-state" Wen Congyang
2014-10-24 14:04   ` Eric Blake
2014-10-24 14:04     ` Eric Blake
2014-10-27  1:26     ` Wen Congyang
2014-10-27  1:26     ` Wen Congyang
2014-10-25 15:11   ` Stefano Stabellini
2014-10-25 15:11   ` Stefano Stabellini
2014-10-24  7:06 ` Wen Congyang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.