From: zhanghailiang <zhang.zhanghailiang@huawei.com>
To: qemu-devel@nongnu.org, qemu-block@nongnu.org
Cc: stefanha@redhat.com, kwolf@redhat.com, mreitz@redhat.com,
pbonzini@redhat.com, wency@cn.fujitsu.com,
xiecl.fnst@cn.fujitsu.com,
zhanghailiang <zhang.zhanghailiang@huawei.com>,
Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
Subject: [Qemu-devel] [PATCH RFC v2 1/6] docs/block-replication: Add description for shared-disk case
Date: Mon, 5 Dec 2016 16:34:59 +0800 [thread overview]
Message-ID: <1480926904-17596-2-git-send-email-zhang.zhanghailiang@huawei.com> (raw)
In-Reply-To: <1480926904-17596-1-git-send-email-zhang.zhanghailiang@huawei.com>
Introuduce the scenario of shared-disk block replication
and how to use it.
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
---
v2:
- fix some problems found by Changlong
---
docs/block-replication.txt | 139 +++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 135 insertions(+), 4 deletions(-)
diff --git a/docs/block-replication.txt b/docs/block-replication.txt
index 6bde673..fbfe005 100644
--- a/docs/block-replication.txt
+++ b/docs/block-replication.txt
@@ -24,7 +24,7 @@ only dropped at next checkpoint time. To reduce the network transportation
effort during a vmstate checkpoint, the disk modification operations of
the Primary disk are asynchronously forwarded to the Secondary node.
-== Workflow ==
+== Non-shared disk workflow ==
The following is the image of block replication workflow:
+----------------------+ +------------------------+
@@ -57,7 +57,7 @@ The following is the image of block replication workflow:
4) Secondary write requests will be buffered in the Disk buffer and it
will overwrite the existing sector content in the buffer.
-== Architecture ==
+== Non-shared disk architecture ==
We are going to implement block replication from many basic
blocks that are already in QEMU.
@@ -106,6 +106,74 @@ any state that would otherwise be lost by the speculative write-through
of the NBD server into the secondary disk. So before block replication,
the primary disk and secondary disk should contain the same data.
+== Shared Disk Mode Workflow ==
+The following is the image of block replication workflow:
+
+ +----------------------+ +------------------------+
+ |Primary Write Requests| |Secondary Write Requests|
+ +----------------------+ +------------------------+
+ | |
+ | (4)
+ | V
+ | /-------------\
+ | (2)Forward and write through | |
+ | +--------------------------> | Disk Buffer |
+ | | | |
+ | | \-------------/
+ | |(1)read |
+ | | |
+ (3)write | | | backing file
+ V | |
+ +-----------------------------+ |
+ | Shared Disk | <-----+
+ +-----------------------------+
+
+ 1) Primary writes will read original data and forward it to Secondary
+ QEMU.
+ 2) Before Primary write requests are written to Shared disk, the
+ original sector content will be read from Shared disk and
+ forwarded and buffered in the Disk buffer on the secondary site,
+ but it will not overwrite the existing sector content (it could be
+ from either "Secondary Write Requests" or previous COW of "Primary
+ Write Requests") in the Disk buffer.
+ 3) Primary write requests will be written to Shared disk.
+ 4) Secondary write requests will be buffered in the Disk buffer and it
+ will overwrite the existing sector content in the buffer.
+
+== Shared Disk Mode Architecture ==
+We are going to implement block replication from many basic
+blocks that are already in QEMU.
+ virtio-blk || .----------
+ / || | Secondary
+ / || '----------
+ / || virtio-blk
+ / || |
+ | || replication(5)
+ | NBD --------> NBD (2) |
+ | client || server ---> hidden disk <-- active disk(4)
+ | ^ || |
+ | replication(1) || |
+ | | || |
+ | +-----------------' || |
+ (3) |drive-backup sync=none || |
+--------. | +-----------------+ || |
+Primary | | | || backing |
+--------' | | || |
+ V | |
+ +-------------------------------------------+ |
+ | shared disk | <----------+
+ +-------------------------------------------+
+
+
+ 1) Primary writes will read original data and forward it to Secondary
+ QEMU.
+ 2) The hidden-disk buffers the original content that is modified by the
+ primary VM. It should also be an empty disk, and the driver supports
+ bdrv_make_empty() and backing file.
+ 3) Primary write requests will be written to Shared disk.
+ 4) Secondary write requests will be buffered in the active disk and it
+ will overwrite the existing sector content in the buffer.
+
== Failure Handling ==
There are 7 internal errors when block replication is running:
1. I/O error on primary disk
@@ -145,7 +213,7 @@ d. replication_stop_all()
things except failover. The caller must hold the I/O mutex lock if it is
in migration/checkpoint thread.
-== Usage ==
+== Non-shared disk usage ==
Primary:
-drive if=xxx,driver=quorum,read-pattern=fifo,id=colo1,vote-threshold=1,\
children.0.file.filename=1.raw,\
@@ -234,6 +302,69 @@ Secondary:
The primary host is down, so we should do the following thing:
{ 'execute': 'nbd-server-stop' }
+== Shared disk usage ==
+Primary:
+ -drive if=virtio,id=primary_disk0,file.filename=1.raw,driver=raw
+
+Issue qmp command:
+ { 'execute': 'blockdev-add',
+ 'arguments': {
+ 'driver': 'replication',
+ 'node-name': 'rep',
+ 'mode': 'primary',
+ 'shared-disk-id': 'primary_disk0',
+ 'shared-disk': true,
+ 'file': {
+ 'driver': 'nbd',
+ 'export': 'hidden_disk0',
+ 'server': {
+ 'type': 'inet',
+ 'data': {
+ 'host': 'xxx.xxx.xxx.xxx',
+ 'port': 'yyy'
+ }
+ }
+ }
+ }
+ }
+
+Secondary:
+ -drive if=none,driver=qcow2,file.filename=/mnt/ramfs/hidden_disk.img,id=hidden_disk0,\
+ backing.driver=raw,backing.file.filename=1.raw \
+ -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\
+ file.driver=qcow2,top-id=active-disk0,\
+ file.file.filename=/mnt/ramfs/active_disk.img,\
+ file.backing=hidden_disk0,shared-disk=on
+
+Issue qmp command:
+1. { 'execute': 'nbd-server-start',
+ 'arguments': {
+ 'addr': {
+ 'type': 'inet',
+ 'data': {
+ 'host': '0',
+ 'port': 'yyy'
+ }
+ }
+ }
+ }
+2. { 'execute': 'nbd-server-add',
+ 'arguments': {
+ 'device': 'hidden_disk0',
+ 'writable': true
+ }
+ }
+
+After Failover:
+Primary:
+ { 'execute': 'x-blockdev-del',
+ 'arguments': {
+ 'node-name': 'rep'
+ }
+ }
+
+Secondary:
+ {'execute': 'nbd-server-stop' }
+
TODO:
1. Continuous block replication
-2. Shared disk
--
1.8.3.1
next prev parent reply other threads:[~2016-12-05 8:40 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-12-05 8:34 [Qemu-devel] [PATCH RFC v2 0/6] COLO block replication supports shared disk case zhanghailiang
2016-12-05 8:34 ` zhanghailiang [this message]
2016-12-20 11:23 ` [Qemu-devel] [PATCH RFC v2 1/6] docs/block-replication: Add description for shared-disk case Changlong Xie
2017-01-13 13:41 ` Stefan Hajnoczi
2017-01-19 2:50 ` Hailiang Zhang
2017-01-19 16:41 ` Stefan Hajnoczi
2017-01-20 2:35 ` Hailiang Zhang
2016-12-05 8:35 ` [Qemu-devel] [PATCH RFC v2 2/6] replication: add shared-disk and shared-disk-id options zhanghailiang
2016-12-05 16:22 ` Eric Blake
2017-01-18 6:58 ` Hailiang Zhang
2016-12-20 11:34 ` Changlong Xie
2017-01-17 11:25 ` Stefan Hajnoczi
2017-01-18 6:54 ` Hailiang Zhang
2016-12-05 8:35 ` [Qemu-devel] [PATCH RFC v2 3/6] replication: Split out backup_do_checkpoint() from secondary_do_checkpoint() zhanghailiang
2016-12-20 12:41 ` Changlong Xie
2017-01-17 13:10 ` Stefan Hajnoczi
2016-12-05 8:35 ` [Qemu-devel] [PATCH RFC v2 4/6] replication: fix code logic with the new shared_disk option zhanghailiang
2016-12-20 12:42 ` Changlong Xie
2017-01-18 6:53 ` Hailiang Zhang
2017-01-17 13:15 ` Stefan Hajnoczi
2016-12-05 8:35 ` [Qemu-devel] [PATCH RFC v2 5/6] replication: Implement block replication for shared disk case zhanghailiang
2017-01-17 13:19 ` Stefan Hajnoczi
2017-01-18 6:53 ` Hailiang Zhang
2016-12-05 8:35 ` [Qemu-devel] [PATCH RFC v2 6/6] nbd/replication: implement .bdrv_get_info() for nbd and replication driver zhanghailiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1480926904-17596-2-git-send-email-zhang.zhanghailiang@huawei.com \
--to=zhang.zhanghailiang@huawei.com \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=wency@cn.fujitsu.com \
--cc=xiecl.fnst@cn.fujitsu.com \
--cc=zhangchen.fnst@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).