From: Hongyang Yang <yanghy@cn.fujitsu.com>
To: Wei Liu <wei.liu2@citrix.com>
Cc: ian.campbell@citrix.com, wency@cn.fujitsu.com,
ian.jackson@eu.citrix.com, yunhong.jiang@intel.com,
eddie.dong@intel.com, xen-devel@lists.xen.org,
rshriram@cs.ubc.ca
Subject: Re: [RFC PATCH COLO v5 01/29] Add readme
Date: Tue, 14 Apr 2015 12:06:56 +0800 [thread overview]
Message-ID: <552C9260.4040005@cn.fujitsu.com> (raw)
In-Reply-To: <20150408181120.GP30811@zion.uk.xensource.com>
Wei,
Thanks for the review and sorry for the late reply, was debugging some triple
fault bug. However, it is now fixed, and COLO running much more stable now.
For the readme, it is kind of outdated, we will update it and also address
your comments, then we'll put it onto wiki page. The intree readme will be some
simple desciption and links to wiki pages.
On 04/09/2015 02:11 AM, Wei Liu wrote:
> On Wed, Apr 01, 2015 at 02:41:37PM +0800, Yang Hongyang wrote:
>> From: Wen Congyang <wency@cn.fujitsu.com>
>>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> ---
>> docs/README.colo | 92 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 92 insertions(+)
>> create mode 100644 docs/README.colo
>>
>> diff --git a/docs/README.colo b/docs/README.colo
>> new file mode 100644
>> index 0000000..60f487d
>> --- /dev/null
>> +++ b/docs/README.colo
>> @@ -0,0 +1,92 @@
>> +COLO provides fault tolerance for virtual machines by sending continuous
>> +checkpoints to a backup, which will activate if the target VM fails. It
>> +only supports HVM guest(without pv extensions).
> ^ PV
>
>> +
>> +Requriements:
>> +1. Hardware requriements
>> + There is at least one directly connected nic to forward the nic from client
>> + to secondary vm. The directly connected nic must not be used by any other
>> + purpose. If your guest has more than one nic, you should have directly
>> + connected nic for each guest nic. If you don't have enouth directly connected
>> + nic, you can use vlan.
>> +2. Dom0 requirements
>> + - Support dom0
>> + - kernel module:
>> + sch_ingress
>> + cls_basic
>> + cls_tcindex
>> + cls_u32
>> + act_mirred
>> + - libnl-tools >= 3.0. This package provides the command nl-qdisc-list, and
>> + colo need this command.
> ^ COLO
>
>> + - If your host os has OEM-released xen tools, please uninstall it first.
> ^OS ^ Xen
>
> (and please fix other occurrences of wrong capitalisations as well)
OK
>
> This is a very broad statement and it is not very helpful from both
> developers and users' point of view. Can you elaborate on what
> functionalities that COLO needs to have exclusive access to?
>
>> + - You can load the module which is not provided by OEM.
>
> What does this mean?
Means you may need to compile the module yourself, will make it clear.
>
>> +3. Guest requirements
>> + Only HVM guest(without pv extensions) is supported now. If you want to
>> + use OEM released guest os, please use SUSE. REDHAT and Ubuntu is not
>> + supported now because I don't find any way to disable pv extensions.
>> + If you want to use REDHAT or Ubuntu, you need to build the newest
>> + kernel which has the parameter xen_nopv.
>> +
>
> FWIW, does "xen_platform_pci=0" in your xl.cfg work for RH and Ubuntu
> guests?
It works only if you compile the newer kernel by yourself which support this
option.
>
>> +Network link topology
>> + Please refer to: http://wiki.qemu.org/Features/COLO#Network_link_topology
>> +
>> +The steps to setup COLO environment:
>> +You need to recompile your host kernel because colo-proxy module need cooperate
>> +with linux kernel.
>> +Please refer to: http://wiki.qemu.org/Features/COLO#Test_environment_prepare
>> +1. Build and install xen
>> +2. Apply the patch for qemu xen, and rebuild xen tools:
>> + - cd tools/qemu-xen-dir
>> + - use git am to apply the patch:
>> + https://raw.githubusercontent.com/wencongyang/colo-files/master/patch_for_qemu/*.patch
>> + - make tools && make install-tools
>> + Note: You must use qemu-xen. qemu-xen-traditional is not supported.
>
> Note that you will eventually need to upstream your changes to QEMU.
Sure, we have already posted the block patches to QEMU. It is under review now.
http://lists.nongnu.org/archive/html/qemu-devel/2015-04/msg00399.html
>
>> +3. Install COLO proxy module:
>> + 3.1 Download COLO proxy, compile and install it:
>> + https://github.com/gao-feng/colo-proxy.git
>> + 3.2 Download iptables patch, it is based on v1.4.21 compile and install it:
>> + https://github.com/gao-feng/colo-proxy/blob/master/colo-patch-for-kernel.patch
>> +4. Install the guest
>> + 4.1 Add "xen_platform_pci=0" into the guest configfile
>> + 4.2 If you use suse, please select physical machine
>> + 4.3 copy the disk image to the secondary host
>> +5. Update your guest config file for COLO:
>> + 5.1 disk
>> + disk = [
>> + 'format=raw,devtype=disk,access=w,vdev=hda,backendtype=qdisk,colo,colo-params=192.168.3.1:9000:exportname=qdisk1,active-disk=/mnt/ramfs/active_disk.img,hidden-disk=/mnt/ramfs/hidden_disk.img,target=/root/images/colo-hvm.img' ]
>
> It's unclear which parts are updated compared to the original config,
> i.e. can you list the additional bits to enable COLO? Presumably it's
> only those options starting with "colo"?
mostly, will update the doc to make it clear.
>
>> + 5.2 nic
>> + vif = [ 'mac=00:16:4f:00:00:11, bridge=br0, model=e1000, forwarddev=eth0, forwardbr=br1' ]
>> + Note:
>> + a. The ip/port in colo-params is the secondary host's IP. Don't use the
>> + directly connected nic's IP.
>> + b. forwarddev is the directly connected nic.
>> + c. If you have more than one disk, colo-params's host/port must be the same
>> + and colo-param's exportname must be different.
>> +6. Run COLO:
>> + xl remus -c -u <domname> <secondary host IP>
>> + Note: The ip must not be the directly connected nic's IP.
>> +Note:
>> +Secondary host only need to do step 1-3.
>> +
>> +The known problem:
>> +1. Secondary vm may crash due to triple fault.
>> +2. The heartbeat is not reliable. If you want to test the performance,
>> + please disable the heartbeat(modify the xen codes). You can use the
>> + branch colo-v4-noheartbeat.
>> +3. Suspending the vm fails, and the error message is:
>> + libxl: error: libxl_qmp.c:429:qmp_next: timeout
>> +
>> +Problem 1 and 3 don't happen every time. So you can run colo again to
>> +avoid this problem.
>> +
>> +Virtio-Net:
>> +1. If you want to get better performance, you can use virtio-net.
>> +
>> +Trouble shooting:
>> +If there's some error happend when staritng COLO, you can do:
>> +1. Make sure you have all necessary modules that DOM0 needed on both side.
>> +2. Make sure you have followed all the instructions in this README.
>> +3. Try to reboot both primary and secondary host.
>> +4. If you still have problems, collect the error logs and contact
>> + Wen Congyang(wency@cn.fujitsu.com)/Yang Hongyang(yanghy@cn.fujitsu.com).
>
> After reading this whole document I think it should be a wiki page
> instead of an in-tree README.
Agreed.
>
> Wei.
>
>> --
>> 1.9.1
> .
>
--
Thanks,
Yang.
next prev parent reply other threads:[~2015-04-14 4:06 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-01 6:41 [RFC PATCH COLO v5 00/29] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Yang Hongyang
2015-04-01 6:41 ` [RFC PATCH COLO v5 01/29] Add readme Yang Hongyang
2015-04-08 18:11 ` Wei Liu
2015-04-14 4:06 ` Hongyang Yang [this message]
2015-04-01 6:41 ` [RFC PATCH COLO v5 02/29] Refactor domain_suspend_callback_common() Yang Hongyang
2015-04-08 18:11 ` Wei Liu
2015-04-14 5:56 ` Wen Congyang
2015-04-22 14:45 ` Ian Campbell
2015-04-01 6:41 ` [RFC PATCH COLO v5 03/29] tools: libxl: introduce a new API libxl__domain_restore() to read qemu state Yang Hongyang
2015-04-08 18:11 ` Wei Liu
2015-04-15 13:19 ` Ian Jackson
2015-04-01 6:41 ` [RFC PATCH COLO v5 04/29] Update libxl__domain_suspend_common_switch_qemu_logdirty() for colo Yang Hongyang
2015-04-08 18:12 ` Wei Liu
2015-04-01 6:41 ` [RFC PATCH COLO v5 05/29] Introduce a new internal API libxl__domain_unpause() Yang Hongyang
2015-04-08 18:12 ` Wei Liu
2015-04-01 6:41 ` [RFC PATCH COLO v5 06/29] Update libxl__domain_unpause() to support qemu-xen Yang Hongyang
2015-04-08 18:12 ` Wei Liu
2015-04-01 6:41 ` [RFC PATCH COLO v5 07/29] support to resume uncooperative HVM guests Yang Hongyang
2015-04-08 18:12 ` Wei Liu
2015-04-23 12:09 ` Wen Congyang
2015-04-22 14:54 ` Ian Campbell
2015-04-23 12:08 ` Wen Congyang
2015-04-01 6:41 ` [RFC PATCH COLO v5 08/29] tools/libxl: Introduce bitops macros Yang Hongyang
2015-04-22 15:10 ` Ian Campbell
2015-04-23 11:56 ` Wen Congyang
2015-04-01 6:41 ` [RFC PATCH COLO v5 09/29] move remus related codes to libxl_remus.c Yang Hongyang
2015-04-01 6:41 ` [RFC PATCH COLO v5 10/29] rename remus device to checkpoint device Yang Hongyang
2015-04-01 6:41 ` [RFC PATCH COLO v5 11/29] adjust the indentation Yang Hongyang
2015-04-22 15:20 ` Ian Campbell
2015-04-01 6:41 ` [RFC PATCH COLO v5 12/29] don't touch remus in checkpoint_device Yang Hongyang
2015-04-01 6:41 ` [RFC PATCH COLO v5 13/29] Update libxl_save_msgs_gen.pl to support return data from xl to xc Yang Hongyang
2015-04-01 6:41 ` [RFC PATCH COLO v5 14/29] Allow slave sends data to master Yang Hongyang
2015-04-01 6:41 ` [RFC PATCH COLO v5 15/29] secondary vm suspend/resume/checkpoint code Yang Hongyang
2015-04-01 6:41 ` [RFC PATCH COLO v5 16/29] primary vm suspend/get_dirty_pfn/resume/checkpoint code Yang Hongyang
2015-04-01 6:41 ` [RFC PATCH COLO v5 17/29] xc_domain_save: flush cache before calling callbacks->postcopy() in colo mode Yang Hongyang
2015-04-01 6:41 ` [RFC PATCH COLO v5 18/29] COLO: xc related codes Yang Hongyang
2015-04-01 6:41 ` [RFC PATCH COLO v5 19/29] send store mfn and console mfn to xl before resuming secondary vm Yang Hongyang
2015-04-01 6:41 ` [RFC PATCH COLO v5 20/29] implement the cmdline for COLO Yang Hongyang
2015-04-01 6:41 ` [RFC PATCH COLO v5 21/29] tools: xc_doamin_restore: zero ioreq page only one time Yang Hongyang
2015-04-01 6:55 ` [RFC PATCH COLO v5 22/29] Support colo mode for qemu disk Yang Hongyang
2015-04-01 6:57 ` [RFC PATCH COLO v5 23/29] COLO: use qemu block replication Yang Hongyang
2015-04-01 6:57 ` [RFC PATCH COLO v5 24/29] COLO proxy: implement setup/teardown of COLO proxy module Yang Hongyang
2015-04-01 6:57 ` [RFC PATCH COLO v5 25/29] COLO proxy: preresume, postresume and checkpoint Yang Hongyang
2015-04-01 6:57 ` [RFC PATCH COLO v5 26/29] COLO nic: implement COLO nic subkind Yang Hongyang
2015-04-01 6:58 ` [RFC PATCH COLO v5 27/29] setup and control colo proxy on primary side Yang Hongyang
2015-04-01 6:58 ` [RFC PATCH COLO v5 28/29] setup and control colo proxy on secondary side Yang Hongyang
2015-04-01 6:58 ` [RFC PATCH COLO v5 29/29] cmdline switches and config vars to control colo-proxy Yang Hongyang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=552C9260.4040005@cn.fujitsu.com \
--to=yanghy@cn.fujitsu.com \
--cc=eddie.dong@intel.com \
--cc=ian.campbell@citrix.com \
--cc=ian.jackson@eu.citrix.com \
--cc=rshriram@cs.ubc.ca \
--cc=wei.liu2@citrix.com \
--cc=wency@cn.fujitsu.com \
--cc=xen-devel@lists.xen.org \
--cc=yunhong.jiang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.