From: zhanghailiang <zhang.zhanghailiang@huawei.com>
To: stefanha@redhat.com, qemu-devel@nongnu.org
Cc: qemu-block@nongnu.org, kwolf@redhat.com,
xiecl.fnst@cn.fujitsu.com, zhangchen.fnst@cn.fujitsu.com,
zhanghailiang <zhang.zhanghailiang@huawei.com>,
Wen Congyang <wency@cn.fujitsu.com>
Subject: [Qemu-devel] [PATCH v4 1/6] docs/block-replication: Add description for shared-disk case
Date: Wed, 12 Apr 2017 22:05:16 +0800 [thread overview]
Message-ID: <1492005921-15664-2-git-send-email-zhang.zhanghailiang@huawei.com> (raw)
In-Reply-To: <1492005921-15664-1-git-send-email-zhang.zhanghailiang@huawei.com>
Introuduce the scenario of shared-disk block replication
and how to use it.
Reviewed-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
---
docs/block-replication.txt | 139 +++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 135 insertions(+), 4 deletions(-)
diff --git a/docs/block-replication.txt b/docs/block-replication.txt
index 6bde673..fbfe005 100644
--- a/docs/block-replication.txt
+++ b/docs/block-replication.txt
@@ -24,7 +24,7 @@ only dropped at next checkpoint time. To reduce the network transportation
effort during a vmstate checkpoint, the disk modification operations of
the Primary disk are asynchronously forwarded to the Secondary node.
-== Workflow ==
+== Non-shared disk workflow ==
The following is the image of block replication workflow:
+----------------------+ +------------------------+
@@ -57,7 +57,7 @@ The following is the image of block replication workflow:
4) Secondary write requests will be buffered in the Disk buffer and it
will overwrite the existing sector content in the buffer.
-== Architecture ==
+== Non-shared disk architecture ==
We are going to implement block replication from many basic
blocks that are already in QEMU.
@@ -106,6 +106,74 @@ any state that would otherwise be lost by the speculative write-through
of the NBD server into the secondary disk. So before block replication,
the primary disk and secondary disk should contain the same data.
+== Shared Disk Mode Workflow ==
+The following is the image of block replication workflow:
+
+ +----------------------+ +------------------------+
+ |Primary Write Requests| |Secondary Write Requests|
+ +----------------------+ +------------------------+
+ | |
+ | (4)
+ | V
+ | /-------------\
+ | (2)Forward and write through | |
+ | +--------------------------> | Disk Buffer |
+ | | | |
+ | | \-------------/
+ | |(1)read |
+ | | |
+ (3)write | | | backing file
+ V | |
+ +-----------------------------+ |
+ | Shared Disk | <-----+
+ +-----------------------------+
+
+ 1) Primary writes will read original data and forward it to Secondary
+ QEMU.
+ 2) Before Primary write requests are written to Shared disk, the
+ original sector content will be read from Shared disk and
+ forwarded and buffered in the Disk buffer on the secondary site,
+ but it will not overwrite the existing sector content (it could be
+ from either "Secondary Write Requests" or previous COW of "Primary
+ Write Requests") in the Disk buffer.
+ 3) Primary write requests will be written to Shared disk.
+ 4) Secondary write requests will be buffered in the Disk buffer and it
+ will overwrite the existing sector content in the buffer.
+
+== Shared Disk Mode Architecture ==
+We are going to implement block replication from many basic
+blocks that are already in QEMU.
+ virtio-blk || .----------
+ / || | Secondary
+ / || '----------
+ / || virtio-blk
+ / || |
+ | || replication(5)
+ | NBD --------> NBD (2) |
+ | client || server ---> hidden disk <-- active disk(4)
+ | ^ || |
+ | replication(1) || |
+ | | || |
+ | +-----------------' || |
+ (3) |drive-backup sync=none || |
+--------. | +-----------------+ || |
+Primary | | | || backing |
+--------' | | || |
+ V | |
+ +-------------------------------------------+ |
+ | shared disk | <----------+
+ +-------------------------------------------+
+
+
+ 1) Primary writes will read original data and forward it to Secondary
+ QEMU.
+ 2) The hidden-disk buffers the original content that is modified by the
+ primary VM. It should also be an empty disk, and the driver supports
+ bdrv_make_empty() and backing file.
+ 3) Primary write requests will be written to Shared disk.
+ 4) Secondary write requests will be buffered in the active disk and it
+ will overwrite the existing sector content in the buffer.
+
== Failure Handling ==
There are 7 internal errors when block replication is running:
1. I/O error on primary disk
@@ -145,7 +213,7 @@ d. replication_stop_all()
things except failover. The caller must hold the I/O mutex lock if it is
in migration/checkpoint thread.
-== Usage ==
+== Non-shared disk usage ==
Primary:
-drive if=xxx,driver=quorum,read-pattern=fifo,id=colo1,vote-threshold=1,\
children.0.file.filename=1.raw,\
@@ -234,6 +302,69 @@ Secondary:
The primary host is down, so we should do the following thing:
{ 'execute': 'nbd-server-stop' }
+== Shared disk usage ==
+Primary:
+ -drive if=virtio,id=primary_disk0,file.filename=1.raw,driver=raw
+
+Issue qmp command:
+ { 'execute': 'blockdev-add',
+ 'arguments': {
+ 'driver': 'replication',
+ 'node-name': 'rep',
+ 'mode': 'primary',
+ 'shared-disk-id': 'primary_disk0',
+ 'shared-disk': true,
+ 'file': {
+ 'driver': 'nbd',
+ 'export': 'hidden_disk0',
+ 'server': {
+ 'type': 'inet',
+ 'data': {
+ 'host': 'xxx.xxx.xxx.xxx',
+ 'port': 'yyy'
+ }
+ }
+ }
+ }
+ }
+
+Secondary:
+ -drive if=none,driver=qcow2,file.filename=/mnt/ramfs/hidden_disk.img,id=hidden_disk0,\
+ backing.driver=raw,backing.file.filename=1.raw \
+ -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\
+ file.driver=qcow2,top-id=active-disk0,\
+ file.file.filename=/mnt/ramfs/active_disk.img,\
+ file.backing=hidden_disk0,shared-disk=on
+
+Issue qmp command:
+1. { 'execute': 'nbd-server-start',
+ 'arguments': {
+ 'addr': {
+ 'type': 'inet',
+ 'data': {
+ 'host': '0',
+ 'port': 'yyy'
+ }
+ }
+ }
+ }
+2. { 'execute': 'nbd-server-add',
+ 'arguments': {
+ 'device': 'hidden_disk0',
+ 'writable': true
+ }
+ }
+
+After Failover:
+Primary:
+ { 'execute': 'x-blockdev-del',
+ 'arguments': {
+ 'node-name': 'rep'
+ }
+ }
+
+Secondary:
+ {'execute': 'nbd-server-stop' }
+
TODO:
1. Continuous block replication
-2. Shared disk
--
1.8.3.1
next prev parent reply other threads:[~2017-04-12 14:06 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-12 14:05 [Qemu-devel] [PATCH v4 0/6] COLO block replication supports shared disk case zhanghailiang
2017-04-12 14:05 ` zhanghailiang [this message]
2017-04-12 14:05 ` [Qemu-devel] [PATCH v4 2/6] replication: add shared-disk and shared-disk-id options zhanghailiang
2017-04-12 14:28 ` Eric Blake
2017-04-17 6:31 ` Hailiang Zhang
2017-04-18 5:59 ` Xie Changlong
2017-05-12 6:25 ` Hailiang Zhang
2017-05-11 19:08 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2017-05-12 6:28 ` Hailiang Zhang
2017-04-12 14:05 ` [Qemu-devel] [PATCH v4 3/6] replication: Split out backup_do_checkpoint() from secondary_do_checkpoint() zhanghailiang
2017-04-12 14:05 ` [Qemu-devel] [PATCH v4 4/6] replication: fix code logic with the new shared_disk option zhanghailiang
2017-04-12 14:05 ` [Qemu-devel] [PATCH v4 5/6] replication: Implement block replication for shared disk case zhanghailiang
2017-05-11 19:15 ` Stefan Hajnoczi
2017-05-12 7:03 ` Hailiang Zhang
2017-04-12 14:05 ` [Qemu-devel] [PATCH v4 6/6] nbd/replication: implement .bdrv_get_info() for nbd and replication driver zhanghailiang
2017-05-11 19:17 ` [Qemu-devel] [Qemu-block] [PATCH v4 0/6] COLO block replication supports shared disk case Stefan Hajnoczi
2017-05-12 7:26 ` Hailiang Zhang
2017-05-16 10:41 ` [Qemu-devel] " 吴志勇
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1492005921-15664-2-git-send-email-zhang.zhanghailiang@huawei.com \
--to=zhang.zhanghailiang@huawei.com \
--cc=kwolf@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=wency@cn.fujitsu.com \
--cc=xiecl.fnst@cn.fujitsu.com \
--cc=zhangchen.fnst@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).