From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51659) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cBEGM-0002Ti-KI for qemu-devel@nongnu.org; Mon, 28 Nov 2016 00:14:00 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cBEGL-0004e8-2z for qemu-devel@nongnu.org; Mon, 28 Nov 2016 00:13:58 -0500 References: <1476971860-20860-1-git-send-email-zhang.zhanghailiang@huawei.com> <1476971860-20860-2-git-send-email-zhang.zhanghailiang@huawei.com> <580F1FDE.8050401@cn.fujitsu.com> From: Hailiang Zhang Message-ID: <583BBCE6.5070307@huawei.com> Date: Mon, 28 Nov 2016 13:13:10 +0800 MIME-Version: 1.0 In-Reply-To: <580F1FDE.8050401@cn.fujitsu.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH RFC 1/7] docs/block-replication: Add description for shared-disk case List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Changlong Xie , qemu-devel@nongnu.org, qemu-block@nongnu.org Cc: stefanha@redhat.com, kwolf@redhat.com, mreitz@redhat.com, pbonzini@redhat.com, wency@cn.fujitsu.com, Zhang Chen On 2016/10/25 17:03, Changlong Xie wrote: > On 10/20/2016 09:57 PM, zhanghailiang wrote: >> Introuduce the scenario of shared-disk block replication >> and how to use it. >> >> Signed-off-by: zhanghailiang >> Signed-off-by: Wen Congyang >> Signed-off-by: Zhang Chen >> --- >> docs/block-replication.txt | 131 +++++++++++++++++++++++++++++++++++++++++++-- >> 1 file changed, 127 insertions(+), 4 deletions(-) >> >> diff --git a/docs/block-replication.txt b/docs/block-replication.txt >> index 6bde673..97fcfc1 100644 >> --- a/docs/block-replication.txt >> +++ b/docs/block-replication.txt >> @@ -24,7 +24,7 @@ only dropped at next checkpoint time. To reduce the network transportation >> effort during a vmstate checkpoint, the disk modification operations of >> the Primary disk are asynchronously forwarded to the Secondary node. >> >> -== Workflow == >> +== Non-shared disk workflow == >> The following is the image of block replication workflow: >> >> +----------------------+ +------------------------+ >> @@ -57,7 +57,7 @@ The following is the image of block replication workflow: >> 4) Secondary write requests will be buffered in the Disk buffer and it >> will overwrite the existing sector content in the buffer. >> >> -== Architecture == >> +== None-shared disk architecture == > > s/None-shared/Non-shared/g > >> We are going to implement block replication from many basic >> blocks that are already in QEMU. >> >> @@ -106,6 +106,74 @@ any state that would otherwise be lost by the speculative write-through >> of the NBD server into the secondary disk. So before block replication, >> the primary disk and secondary disk should contain the same data. >> >> +== Shared Disk Mode Workflow == >> +The following is the image of block replication workflow: >> + >> + +----------------------+ +------------------------+ >> + |Primary Write Requests| |Secondary Write Requests| >> + +----------------------+ +------------------------+ >> + | | >> + | (4) >> + | V >> + | /-------------\ >> + | (2)Forward and write through | | >> + | +--------------------------> | Disk Buffer | >> + | | | | >> + | | \-------------/ >> + | |(1)read | >> + | | | >> + (3)write | | | backing file >> + V | | >> + +-----------------------------+ | >> + | Shared Disk | <-----+ >> + +-----------------------------+ >> + >> + 1) Primary writes will read original data and forward it to Secondary >> + QEMU. >> + 2) Before Primary write requests are written to Shared disk, the >> + original sector content will be read from Shared disk and >> + forwarded and buffered in the Disk buffer on the secondary site, >> + but it will not overwrite the existing > > extra spaces at the end of line > >> + sector content(it could be from either "Secondary Write Requests" or > > Need a space before "(" for better style. > >> + previous COW of "Primary Write Requests") in the Disk buffer. >> + 3) Primary write requests will be written to Shared disk. >> + 4) Secondary write requests will be buffered in the Disk buffer and it >> + will overwrite the existing sector content in the buffer. >> + >> +== Shared Disk Mode Architecture == >> +We are going to implement block replication from many basic >> +blocks that are already in QEMU. >> + virtio-blk || .---------- >> + / || | Secondary >> + / || '---------- >> + / || virtio-blk >> + / || | >> + | || replication(5) >> + | NBD --------> NBD (2) | >> + | client || server ---> hidden disk <-- active disk(4) >> + | ^ || | >> + | replication(1) || | >> + | | || | >> + | +-----------------' || | >> + (3) |drive-backup sync=none || | >> +--------. | +-----------------+ || | >> +Primary | | | || backing | >> +--------' | | || | >> + V | | >> + +-------------------------------------------+ | >> + | shared disk | <----------+ >> + +-------------------------------------------+ >> + >> + >> + 1) Primary writes will read original data and forward it to Secondary >> + QEMU. >> + 2) The hidden-disk buffers the original content that is modified by the >> + primary VM. It should also be an empty disk, and > > extra spaces at end of line > >> + the driver supports bdrv_make_empty() and backing file. >> + 3) Primary write requests will be written to Shared disk. >> + 4) Secondary write requests will be buffered in the active disk and it >> + will overwrite the existing sector content in the buffer. >> + >> == Failure Handling == >> There are 7 internal errors when block replication is running: >> 1. I/O error on primary disk >> @@ -145,7 +213,7 @@ d. replication_stop_all() >> things except failover. The caller must hold the I/O mutex lock if it is >> in migration/checkpoint thread. >> >> -== Usage == >> +== Non-shared disk usage == >> Primary: >> -drive if=xxx,driver=quorum,read-pattern=fifo,id=colo1,vote-threshold=1,\ >> children.0.file.filename=1.raw,\ >> @@ -234,6 +302,61 @@ Secondary: >> The primary host is down, so we should do the following thing: >> { 'execute': 'nbd-server-stop' } >> >> +== Shared disk usage == > > Keep the some coding style with "== Non-shared disk usage ==" part is > good to me. > >> +Primary: >> + -drive if=virtio,id=primary_disk0,file.filename=1.raw,driver=raw >> + >> +Issue qmp command: >> + {'execute': 'human-monitor-command', > > two space indentation for the whole "{...}" part > >> + 'arguments': { >> + 'command-line': 'drive_add-nbuddydriver=replication, > > missing spaces > >> + mode=primary, >> + file.driver=nbd, >> + file.host=9.42.3.17, >> + file.port=9998, >> + file.export=hidden_disk0, >> + shared-disk-id=primary_disk0, >> + shared-disk=on, >> + node-name=rep' > > Keep the whole commands after "command-line" in one line, or you can > execute it correctly. IIRC > Hmm, i will change this hmp command to qmp 'blockdev-add' command in next version, because it is supported now, though it is ready for production. >> + } >> + } > > Secondary: > >> + -drive if=none,driver=qcow2,file.filename=/mnt/ramfs/hidden_disk.img,id=hidden_disk0,\ >> + backing.driver=raw,backing.file.filename=1.raw \ >> + -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\ >> + file.driver=qcow2,top-id=active-disk0,\ >> + file.file.filename=/mnt/ramfs/active_disk.img,\ >> + file.backing=hidden_disk0,shared-disk=on >> + >> +Issue qmp command: >> +1. {'execute': 'nbd-server-start', >> + 'arguments': { >> + 'addr': { >> + 'type': 'inet', >> + 'data': { >> + 'host': '0', > > s/0/9.42.3.17/g, since you use designated ip address above > >> + 'port': '9998' >> + } >> + } >> + } >> + } >> +2. { >> + 'execute': 'nbd-server-add', >> + 'arguments': { >> + 'device': 'hidden_disk0', >> + 'writable': true >> + } >> + } >> + >> +After Failover: >> +Primary: >> +{'execute': 'human-monitor-command', >> + 'arguments': { >> + 'command-line': 'drive_delrep' > > drive_del rep > I'll use the qmp command instead here. >> + } >> +} >> + >> +Secondary: >> + {'execute': 'nbd-server-stop' } >> + >> TODO: >> 1. Continuous block replication >> -2. Shared disk >> > I will fix all the above problems in next version, thanks. > > > . >