All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hailiang Zhang <zhang.zhanghailiang@huawei.com>
To: Eric Blake <eblake@redhat.com>,
	amit.shah@redhat.com, quintela@redhat.com
Cc: xiecl.fnst@cn.fujitsu.com, lizhijian@cn.fujitsu.com,
	zhangchen.fnst@cn.fujitsu.com, qemu-devel@nongnu.org,
	dgilbert@redhat.com
Subject: Re: [Qemu-devel] [PATCH COLO-Frame (Base) v20 16/17] docs: Add documentation for COLO feature
Date: Sat, 8 Oct 2016 17:32:21 +0800	[thread overview]
Message-ID: <57F8BD25.8020809@huawei.com> (raw)
In-Reply-To: <52ae0dae-e536-7362-e175-e3e36a130026@redhat.com>

On 2016/10/5 21:37, Eric Blake wrote:
> On 09/29/2016 03:46 AM, zhanghailiang wrote:
>> Introduce the design of COLO, and how to test it.
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> ---
>>   docs/COLO-FT.txt | 190 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 190 insertions(+)
>>   create mode 100644 docs/COLO-FT.txt
>>
>
>> +
>> +== Background ==
>> +Virtual machine (VM) replication is a well known technique for providing
>> +application-agnostic software-implemented hardware fault tolerance
>> +"non-stop service".
>
> Do you want s/tolerance/tolerance, also known as/ ?
>

Yes, that is more appropriate.

>
>> +== Architecture ==
>> +
>> +The architecture of COLO is shown in the bellow diagram.
>
> s/bellow diagram/diagram below/
>

>> +It consists of a pair of networked physical nodes:
>> +The primary node running the PVM, and the secondary node running the SVM
>> +to maintain a valid replica of the PVM.
>> +PVM and SVM execute in parallel and generate output of response packets for
>> +client requests according to the application semantics.
>> +
>> +The incoming packets from the client or external network are received by the
>> +primary node, and then forwarded to the secondary node, so that Both the PVM
>
> s/Both/both/
>

>> +and the SVM are stimulated with the same requests.
>> +
>> +COLO receives the outbound packets from both the PVM and SVM and compares them
>> +before allowing the output to be sent to clients.
>> +
>> +The SVM is qualified as a valid replica of the PVM, as long as it generates
>> +identical responses to all client requests. Once the differences in the outputs
>> +are detected between the PVM and SVM, COLO withholds transmission of the
>> +outbound packets until it has successfully synchronized the PVM state to the SVM.
>> +
>
>> +== Components introduction ==
>> +
>> +You can see there are several components in COLO's diagram of architecture.
>> +Their functions are described as bellow.
>
> s/as bellow/below/
>

>> +
>> +HeartBeat:
>> +Runs on both the primary and secondary nodes, to periodically check platform
>> +availability. When the primary node suffers a hardware fail-stop failure,
>> +the heartbeat stops responding, the secondary node will trigger a failover
>> +as soon as it determines the absence.
>> +
>> +COLO disk Manager:
>> +When primary VM writes data into image, the colo disk manger captures this data
>> +and send it to secondary VM’s which makes sure the context of secondary VM's
>
> s/send/sends/
>

>> +image is consentient with the context of primary VM 's image.
>
> s/consentient/consistent/
> s/VM 's/VM's/
>

>> +For more details, please refer to docs/block-replication.txt.
>> +
>> +Checkpoint/Failover Controller:
>> +Modifications of save/restore flow to realize continuous migration,
>> +to make sure the state of VM in Secondary side always be consistent with VM in
>
> s/always be/is always/
>

>> +Primary side.
>> +
>> +COLO Proxy:
>> +Delivers packets to Primary and Seconday, and then compare the responses from
>> +both side. Then decide whether to start a checkpoint according to some rules.
>> +
>> +Note:
>> + a. HeartBeat is not been realized, so you need to trigger failover process
>
> s/is/has/
> s/realized/implemented yet/
>
> Is this note going to be stale once heartbeat is implemented?
>

Yes, but we're not sure if it is suitable to implement it in qemu.

>> +    by using 'x-colo-lost-heartbeat' command.
>> + b. COLO proxy compents is work-in-process, it only support periodic checkpoint
>
> s/compents is/components are a/
>

>> +    mode now, just as Micro-checkpointing.
>> +
>
>> +3. On Primary VM's QEMU monitor, issue command:
>> +{'execute':'qmp_capabilities'}
>> +{ 'execute': 'human-monitor-command',
>> +  'arguments': {'command-line': 'drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=xx.xx.xx.xx,file.port=8889,file.export=colo-disk0,node-name=node0'}}
>
> It would be really nice if we could get this done through QMP
> blockdev-add instead of HMP drive_add.
>

You are right, but this command doesn't support nbd drive yet in upstream.
I saw Max had send a patch-set to support it. I will update this after his
patches been merged.

>> +
>> +Before issuing '{ "execute": "x-colo-lost-heartbeat" }' command, we have to
>> +issue block related command to stop block replication.
>> +Primary:
>> +  Remove the nbd child from the quorum:
>> +  { 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk0', 'child': 'children.1'}}
>> +  { 'execute': 'human-monitor-command','arguments': {'command-line': 'drive_del blk-buddy0'}}
>> +  Note: there is no qmp command to remove the blockdev now
>
> Don't we have x-blockdev-del?
>

Yes, we can use this command, I'll fix it in next version.

>> +
>> +Secondary:
>> +  The primary host is down, so we should do the following thing:
>> +  { 'execute': 'nbd-server-stop' }
>> +
>> +== TODO ==
>> +1. Support continuously VM replication.
>
> s/continuously/continuous/
>
>> +2. Support shared storage.
>> +3. Develop the heartbeat part.
>> +4. Reduce checkpoint VM’s downtime while do checkpoint.
>
> s/do/doing/
>
>>

All the above typos and grammatical mistake  will be fixed in next version, thanks!

Hailiang
>

  reply	other threads:[~2016-10-08  9:34 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-29  8:46 [Qemu-devel] [PATCH COLO-Frame (Base) v20 00/17] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
2016-09-29  8:46 ` [Qemu-devel] [PATCH COLO-Frame (Base) v20 01/17] migration: Introduce capability 'x-colo' to migration zhanghailiang
2016-09-29  8:46 ` [Qemu-devel] [PATCH COLO-Frame (Base) v20 02/17] COLO: migrate COLO related info to secondary node zhanghailiang
2016-09-29  8:46 ` [Qemu-devel] [PATCH COLO-Frame (Base) v20 03/17] migration: Enter into COLO mode after migration if COLO is enabled zhanghailiang
2016-09-29  8:46 ` [Qemu-devel] [PATCH COLO-Frame (Base) v20 04/17] migration: Switch to COLO process after finishing loadvm zhanghailiang
2016-09-29  8:46 ` [Qemu-devel] [PATCH COLO-Frame (Base) v20 05/17] COLO: Establish a new communicating path for COLO zhanghailiang
2016-09-29  8:46 ` [Qemu-devel] [PATCH COLO-Frame (Base) v20 06/17] COLO: Introduce checkpointing protocol zhanghailiang
2016-09-29  8:46 ` [Qemu-devel] [PATCH COLO-Frame (Base) v20 07/17] COLO: Add a new RunState RUN_STATE_COLO zhanghailiang
2016-09-29  8:46 ` [Qemu-devel] [PATCH COLO-Frame (Base) v20 08/17] COLO: Send PVM state to secondary side when do checkpoint zhanghailiang
2016-09-29  8:46 ` [Qemu-devel] [PATCH COLO-Frame (Base) v20 09/17] COLO: Load VMState into QIOChannelBuffer before restore it zhanghailiang
2016-09-29  8:46 ` [Qemu-devel] [PATCH COLO-Frame (Base) v20 10/17] COLO: Add checkpoint-delay parameter for migrate-set-parameters zhanghailiang
2016-09-29  8:46 ` [Qemu-devel] [PATCH COLO-Frame (Base) v20 11/17] COLO: Synchronize PVM's state to SVM periodically zhanghailiang
2016-09-29  8:46 ` [Qemu-devel] [PATCH COLO-Frame (Base) v20 12/17] COLO: Add 'x-colo-lost-heartbeat' command to trigger failover zhanghailiang
2016-09-29  8:46 ` [Qemu-devel] [PATCH COLO-Frame (Base) v20 13/17] COLO: Introduce state to record failover process zhanghailiang
2016-09-29  8:46 ` [Qemu-devel] [PATCH COLO-Frame (Base) v20 14/17] COLO: Implement the process of failover for primary VM zhanghailiang
2016-09-29  8:46 ` [Qemu-devel] [PATCH COLO-Frame (Base) v20 15/17] COLO: Implement failover work for secondary VM zhanghailiang
2016-09-29  8:46 ` [Qemu-devel] [PATCH COLO-Frame (Base) v20 16/17] docs: Add documentation for COLO feature zhanghailiang
2016-09-29 11:45   ` Jonathan Neuschäfer
2016-10-05 13:37   ` Eric Blake
2016-10-08  9:32     ` Hailiang Zhang [this message]
2016-09-29  8:46 ` [Qemu-devel] [PATCH COLO-Frame (Base) v20 17/17] configure: Support enable/disable " zhanghailiang
2016-09-29 12:10 ` [Qemu-devel] [PATCH COLO-Frame (Base) v20 00/17] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) no-reply
2016-09-30  5:53 ` Amit Shah
2016-09-30  6:27   ` Hailiang Zhang
2016-10-05 12:13     ` Amit Shah
2016-10-09  1:21       ` Hailiang Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57F8BD25.8020809@huawei.com \
    --to=zhang.zhanghailiang@huawei.com \
    --cc=amit.shah@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eblake@redhat.com \
    --cc=lizhijian@cn.fujitsu.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=xiecl.fnst@cn.fujitsu.com \
    --cc=zhangchen.fnst@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.