qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Eric Blake <eblake@redhat.com>
To: Yang Hongyang <yanghy@cn.fujitsu.com>, qemu-devel@nongnu.org
Cc: kwolf@redhat.com, Lai Jiangshan <laijs@cn.fujitsu.com>,
	quintela@redhat.com, GuiJianfeng@cn.fujitsu.com,
	yunhong.jiang@intel.com, eddie.dong@intel.com,
	dgilbert@redhat.com, mrhines@linux.vnet.ibm.com,
	stefanha@redhat.com, Amit Shah <amit.shah@redhat.com>,
	pbonzini@redhat.com, walid.nouri@gmail.com
Subject: Re: [Qemu-devel] [PATCH RESEND 1/2] Block: Block replication design for COLO
Date: Wed, 25 Mar 2015 10:06:53 -0600	[thread overview]
Message-ID: <5512DD1D.9020704@redhat.com> (raw)
In-Reply-To: <1419564708-17714-2-git-send-email-yanghy@cn.fujitsu.com>

[-- Attachment #1: Type: text/plain, Size: 6124 bytes --]

On 12/25/2014 08:31 PM, Yang Hongyang wrote:
> This is the initial design of block replication.
> The blkcolo block driver enables disk replication for continuous
> checkpoints. It is designed for COLO that Secondary VM is running.
> It can also be applied for FT/HA scene that Secondary VM is not
> running.
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> ---
>  docs/blkcolo.txt | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 85 insertions(+)
>  create mode 100644 docs/blkcolo.txt

Grammar review only (I'll leave the technical review to others)

> 
> diff --git a/docs/blkcolo.txt b/docs/blkcolo.txt
> new file mode 100644
> index 0000000..41c2a05
> --- /dev/null
> +++ b/docs/blkcolo.txt
> @@ -0,0 +1,85 @@
> +Disk replication using blkcolo
> +----------------------------------------
> +Copyright Fujitsu, Corp. 2014

Visually, the separator line should match the length of the line above,
and maybe have a blank line after.

> +
> +This work is licensed under the terms of the GNU GPL, version 2 or later.
> +See the COPYING file in the top-level directory.
> +
> +The blkcolo block driver enables disk replication for continuous checkpoints.
> +It is designed for COLO that Secondary VM is running. It can also be applied

similar comments as for Wen's RFC COLO v2 series for
docs/block-replication.txt (in fact, do we need two files, or should all
this information be merged into a single file?):

s/for COLO that/for COLO (COurse-grain LOck-stepping replication), where/

> +for FT/HA scene that Secondary VM is not running.

s/for FT/HA scene that/to FT/HA (Fault-tolerance/High assurance)
scenarios, where/

> +
> +This document gives an overview of blkcolo's design.
> +
> +== Background ==
> +High availability solutions such as micro checkpoint and COLO will do
> +consecutive checkpoint. The VM state of Primary VM and Secondary VM is

s/checkpoint/checkpoints/

> +identical right after a VM checkpoint, but becomes different as the VM
> +executes till the next checkpoint. To support disk contents checkpoint,
> +the modified disk contents in the Secondary VM must be buffered, and are
> +only dropped at next checkpoint time. To reduce the network transportation
> +effort at the time of checkpoint, the disk modification operations of
> +Primary disk are asynchronously forwarded to the Secondary node.
> +
> +== Disk Buffer ==
> +The following is the image of Disk buffer:
> +
> +        +----------------------+            +------------------------+
> +        |Primary Write Requests|            |Secondary Write Requests|
> +        +----------------------+            +------------------------+
> +                  |                                       |
> +                  |                                      (4)
> +                  |                                       V
> +                  |                              /-------------\
> +                  |      Copy and Forward        |             |
> +                  |---------(1)----------+       | Disk Buffer |
> +                  |                      |       |             |
> +                  |                     (3)      \-------------/
> +                  |                 speculative      ^
> +                  |                write through    (2)
> +                  |                      |           |
> +                  V                      V           |
> +           +--------------+           +----------------+
> +           | Primary Disk |           | Secondary Disk |
> +           +--------------+           +----------------+
> +    1) Primary write requests will be copied and forwarded to Secondary
> +       QEMU.
> +    2) Before Primary write requests are written to Secondary disk, the
> +       original sector content will be read from Secondary disk and
> +       buffered in the Disk buffer, but it will not overwrite the existing
> +       sector content in the Disk buffer.
> +    3) Primary write requests will be written to Secondary disk.
> +    4) Secondary write requests will be bufferd in the Disk buffer and it

s/bufferd/buffered/

> +       will overwrite the existing sector content in the buffer.
> +
> +== Capture I/O request ==
> +The blkcolo is a new block driver protocol, so all I/O requests can be
> +captured in the driver interface bdrv_co_readv()/bdrv_co_writev().
> +
> +== Checkpoint & failover ==
> +The blkcolo buffers the write requests in Secondary QEMU. And the buffer
> +should be dropped at a checkpoint, or be flushed to Secondary disk when

s/when/on/

> +failover. We add four block driver interfaces to do this:
> +a. bdrv_prepare_checkpoint()
> +   This interface may block, and return when all Primary write

s/return/returns/

> +   requests are forwarded to Secondary QEMU.
> +b. bdrv_do_checkpoint()
> +   This interface is called after all VM state is transfered to

s/transfered/transferred/

> +   Secondary QEMU. The Disk buffer will be dropped in this interface.
> +c. bdrv_get_sent_data_size()
> +   This is used on Primary node.
> +   It should be called by migration/checkpoint thread in order
> +   to decide whether to start a new checkpoint or not. If the data
> +   amount being sent is too large, we should start a new checkpoint.
> +d. bdrv_stop_replication()
> +   It is called when failover. We will flush the Disk buffer into

s/when/on/

> +   Secondary Disk and stop disk replication.
> +
> +== Usage ==
> +On both Primary/Secondary host, invoke QEMU with the following parameters:
> +    "-drive file=blkcolo:host:port:/path/to/image"
> +a. host
> +   Hostname or IP of the Secondary host.
> +b. port
> +   The Secondary QEMU will listen on this port, and the Primary QEMU
> +   will connect to this port.
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

  reply	other threads:[~2015-03-25 16:07 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-26  3:31 [Qemu-devel] [PATCH RESEND 0/2] PoC: Block replication for continuous checkpointing Yang Hongyang
2014-12-26  3:31 ` [Qemu-devel] [PATCH RESEND 1/2] Block: Block replication design for COLO Yang Hongyang
2015-03-25 16:06   ` Eric Blake [this message]
2015-03-25 16:11     ` Eric Blake
2014-12-26  3:31 ` [Qemu-devel] [PATCH RESEND 2/2] PoC: Block replication " Yang Hongyang
2014-12-27 15:23 ` [Qemu-devel] [PATCH RESEND 0/2] PoC: Block replication for continuous checkpointing Paolo Bonzini
2014-12-30  7:52   ` Hongyang Yang
2015-01-05 10:44   ` Dr. David Alan Gilbert
2015-01-06  1:28     ` Wen Congyang
2015-01-09  9:31   ` Hongyang Yang
2015-01-28  6:42   ` Wen Congyang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5512DD1D.9020704@redhat.com \
    --to=eblake@redhat.com \
    --cc=GuiJianfeng@cn.fujitsu.com \
    --cc=amit.shah@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eddie.dong@intel.com \
    --cc=kwolf@redhat.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=mrhines@linux.vnet.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=walid.nouri@gmail.com \
    --cc=yanghy@cn.fujitsu.com \
    --cc=yunhong.jiang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).