From: Eric Blake <eblake@redhat.com>
To: Yang Hongyang <yanghy@cn.fujitsu.com>, qemu-devel@nongnu.org
Cc: kwolf@redhat.com, Lai Jiangshan <laijs@cn.fujitsu.com>,
quintela@redhat.com, GuiJianfeng@cn.fujitsu.com,
yunhong.jiang@intel.com, eddie.dong@intel.com,
dgilbert@redhat.com, mrhines@linux.vnet.ibm.com,
stefanha@redhat.com, Amit Shah <amit.shah@redhat.com>,
pbonzini@redhat.com, walid.nouri@gmail.com
Subject: Re: [Qemu-devel] [PATCH RESEND 1/2] Block: Block replication design for COLO
Date: Wed, 25 Mar 2015 10:06:53 -0600 [thread overview]
Message-ID: <5512DD1D.9020704@redhat.com> (raw)
In-Reply-To: <1419564708-17714-2-git-send-email-yanghy@cn.fujitsu.com>
[-- Attachment #1: Type: text/plain, Size: 6124 bytes --]
On 12/25/2014 08:31 PM, Yang Hongyang wrote:
> This is the initial design of block replication.
> The blkcolo block driver enables disk replication for continuous
> checkpoints. It is designed for COLO that Secondary VM is running.
> It can also be applied for FT/HA scene that Secondary VM is not
> running.
>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> ---
> docs/blkcolo.txt | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 85 insertions(+)
> create mode 100644 docs/blkcolo.txt
Grammar review only (I'll leave the technical review to others)
>
> diff --git a/docs/blkcolo.txt b/docs/blkcolo.txt
> new file mode 100644
> index 0000000..41c2a05
> --- /dev/null
> +++ b/docs/blkcolo.txt
> @@ -0,0 +1,85 @@
> +Disk replication using blkcolo
> +----------------------------------------
> +Copyright Fujitsu, Corp. 2014
Visually, the separator line should match the length of the line above,
and maybe have a blank line after.
> +
> +This work is licensed under the terms of the GNU GPL, version 2 or later.
> +See the COPYING file in the top-level directory.
> +
> +The blkcolo block driver enables disk replication for continuous checkpoints.
> +It is designed for COLO that Secondary VM is running. It can also be applied
similar comments as for Wen's RFC COLO v2 series for
docs/block-replication.txt (in fact, do we need two files, or should all
this information be merged into a single file?):
s/for COLO that/for COLO (COurse-grain LOck-stepping replication), where/
> +for FT/HA scene that Secondary VM is not running.
s/for FT/HA scene that/to FT/HA (Fault-tolerance/High assurance)
scenarios, where/
> +
> +This document gives an overview of blkcolo's design.
> +
> +== Background ==
> +High availability solutions such as micro checkpoint and COLO will do
> +consecutive checkpoint. The VM state of Primary VM and Secondary VM is
s/checkpoint/checkpoints/
> +identical right after a VM checkpoint, but becomes different as the VM
> +executes till the next checkpoint. To support disk contents checkpoint,
> +the modified disk contents in the Secondary VM must be buffered, and are
> +only dropped at next checkpoint time. To reduce the network transportation
> +effort at the time of checkpoint, the disk modification operations of
> +Primary disk are asynchronously forwarded to the Secondary node.
> +
> +== Disk Buffer ==
> +The following is the image of Disk buffer:
> +
> + +----------------------+ +------------------------+
> + |Primary Write Requests| |Secondary Write Requests|
> + +----------------------+ +------------------------+
> + | |
> + | (4)
> + | V
> + | /-------------\
> + | Copy and Forward | |
> + |---------(1)----------+ | Disk Buffer |
> + | | | |
> + | (3) \-------------/
> + | speculative ^
> + | write through (2)
> + | | |
> + V V |
> + +--------------+ +----------------+
> + | Primary Disk | | Secondary Disk |
> + +--------------+ +----------------+
> + 1) Primary write requests will be copied and forwarded to Secondary
> + QEMU.
> + 2) Before Primary write requests are written to Secondary disk, the
> + original sector content will be read from Secondary disk and
> + buffered in the Disk buffer, but it will not overwrite the existing
> + sector content in the Disk buffer.
> + 3) Primary write requests will be written to Secondary disk.
> + 4) Secondary write requests will be bufferd in the Disk buffer and it
s/bufferd/buffered/
> + will overwrite the existing sector content in the buffer.
> +
> +== Capture I/O request ==
> +The blkcolo is a new block driver protocol, so all I/O requests can be
> +captured in the driver interface bdrv_co_readv()/bdrv_co_writev().
> +
> +== Checkpoint & failover ==
> +The blkcolo buffers the write requests in Secondary QEMU. And the buffer
> +should be dropped at a checkpoint, or be flushed to Secondary disk when
s/when/on/
> +failover. We add four block driver interfaces to do this:
> +a. bdrv_prepare_checkpoint()
> + This interface may block, and return when all Primary write
s/return/returns/
> + requests are forwarded to Secondary QEMU.
> +b. bdrv_do_checkpoint()
> + This interface is called after all VM state is transfered to
s/transfered/transferred/
> + Secondary QEMU. The Disk buffer will be dropped in this interface.
> +c. bdrv_get_sent_data_size()
> + This is used on Primary node.
> + It should be called by migration/checkpoint thread in order
> + to decide whether to start a new checkpoint or not. If the data
> + amount being sent is too large, we should start a new checkpoint.
> +d. bdrv_stop_replication()
> + It is called when failover. We will flush the Disk buffer into
s/when/on/
> + Secondary Disk and stop disk replication.
> +
> +== Usage ==
> +On both Primary/Secondary host, invoke QEMU with the following parameters:
> + "-drive file=blkcolo:host:port:/path/to/image"
> +a. host
> + Hostname or IP of the Secondary host.
> +b. port
> + The Secondary QEMU will listen on this port, and the Primary QEMU
> + will connect to this port.
>
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]
next prev parent reply other threads:[~2015-03-25 16:07 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-26 3:31 [Qemu-devel] [PATCH RESEND 0/2] PoC: Block replication for continuous checkpointing Yang Hongyang
2014-12-26 3:31 ` [Qemu-devel] [PATCH RESEND 1/2] Block: Block replication design for COLO Yang Hongyang
2015-03-25 16:06 ` Eric Blake [this message]
2015-03-25 16:11 ` Eric Blake
2014-12-26 3:31 ` [Qemu-devel] [PATCH RESEND 2/2] PoC: Block replication " Yang Hongyang
2014-12-27 15:23 ` [Qemu-devel] [PATCH RESEND 0/2] PoC: Block replication for continuous checkpointing Paolo Bonzini
2014-12-30 7:52 ` Hongyang Yang
2015-01-05 10:44 ` Dr. David Alan Gilbert
2015-01-06 1:28 ` Wen Congyang
2015-01-09 9:31 ` Hongyang Yang
2015-01-28 6:42 ` Wen Congyang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5512DD1D.9020704@redhat.com \
--to=eblake@redhat.com \
--cc=GuiJianfeng@cn.fujitsu.com \
--cc=amit.shah@redhat.com \
--cc=dgilbert@redhat.com \
--cc=eddie.dong@intel.com \
--cc=kwolf@redhat.com \
--cc=laijs@cn.fujitsu.com \
--cc=mrhines@linux.vnet.ibm.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=stefanha@redhat.com \
--cc=walid.nouri@gmail.com \
--cc=yanghy@cn.fujitsu.com \
--cc=yunhong.jiang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).