All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wen Congyang <wency@cn.fujitsu.com>
To: Eric Blake <eblake@redhat.com>,
	qemu devel <qemu-devel@nongnu.org>, Fam Zheng <famz@redhat.com>,
	Max Reitz <mreitz@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
	zhanghailiang <zhang.zhanghailiang@huawei.com>,
	qemu block <qemu-block@nongnu.org>,
	Jiang Yunhong <yunhong.jiang@intel.com>,
	Dong Eddie <eddie.dong@intel.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	"Michael R. Hines" <mrhines@linux.vnet.ibm.com>,
	Gonglei <arei.gonglei@huawei.com>,
	Yang Hongyang <yanghy@cn.fujitsu.com>
Subject: Re: [Qemu-devel] [PATCH 10/16] docs: block replication's description
Date: Wed, 9 Sep 2015 16:22:06 +0800	[thread overview]
Message-ID: <55EFEC2E.50608@cn.fujitsu.com> (raw)
In-Reply-To: <55E75EF9.304@redhat.com>

On 09/03/2015 04:41 AM, Eric Blake wrote:
> On 09/02/2015 02:51 AM, Wen Congyang wrote:
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> ---
>>  docs/block-replication.txt | 183 +++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 183 insertions(+)
>>  create mode 100644 docs/block-replication.txt
>>
> 
> 
>> +
>> +    1) Primary write requests will be copied and forwarded to Secondary
>> +       QEMU.
>> +    2) Before Primary write requests are written to Secondary disk, the
>> +       original sector content will be read from Secondary disk and
>> +       buffered in the Disk buffer, but it will not overwrite the existing
>> +       sector content(it could be from either "Secondary Write Requests" or
> 
> space before '(' in English sentences.
> 
>> +       previous COW of "Primary Write Requests") in the Disk buffer.
>> +    3) Primary write requests will be written to Secondary disk.
>> +    4) Secondary write requests will be buffered in the Disk buffer and it
>> +       will overwrite the existing sector content in the buffer.
>> +
>> +== Architecture ==
> 
>> +                3 NBD  ------->  3 NBD                                               |
>> +                client    ||     server                                          2 filter
>> +                          ||        ^                                                ^
>> +--------.                 ||        |                                                |
>> +Primary |                 ||  Secondary disk <--------- hidden-disk 5 <--------- active-disk 4
>> +--------'                 ||        |          backing        ^       backing
>> +                          ||        |                         |
>> +                          ||        |                         |
>> +                          ||        '-------------------------'
>> +                          ||           drive-backup sync=none
>> +
> 
>> +
>> +4) The disk on the secondary is represented by a custom block device
>> +(called active-disk). It should be an empty disk, and the format should
>> +support bdrv_make_empty() and backing file.
> 
> s/be an empty disk/start as an empty disk/
> 
>> +
>> +5) The hidden-disk is created automatically. It buffers the original content
>> +that is modified by the primary VM. It should also be an empty disk, and
> 
> s/be/start as/
> 
>> +the driver supports bdrv_make_empty() and backing file.
> 
> Missing mention that a drive-backup job is run to allow hidden-disk to
> buffer any state that would otherwise be lost by the speculative
> write-through of the NBD server into the secondary disk.
> 
>> +
>> +== Failure Handling ==
>> +There are 6 internal errors when block replication is running:
>> +1. I/O error on primary disk
>> +2. Forwarding primary write requests failed
>> +3. Backup failed
>> +4. I/O error on secondary disk
>> +5. I/O error on active disk
>> +6. Making active disk or hidden disk empty failed
>> +In case 1 and 5, we just report the error to the disk layer. In case 2, 3,
>> +4 and 6, we just report block replication's error to FT/HA manager(which
> 
> space before '('
> 
>> +decides when to do a new checkpoint, when to do failover).
>> +There is one internal error when doing failover:
>> +1. Commiting the data in active disk/hidden disk to secondary disk failed
> 
> s/Commiting/Committing/
> 
>> +We just to report this error to FT/HA manager.
>> +
>> +== New block driver interface ==
> 
>> +
>> +== Usage ==
>> +Primary:
>> +  -drive if=xxx,driver=quorum,read-pattern=fifo,id=colo1,vote-threshold=1\
>> +         children.0.file.filename=1.raw,\
>> +         children.0.driver=raw,\
>> +
>> +  Run qmp command in primary qemu:
>> +    child_add disk1 child.driver=replication,child.mode=primary,\
>> +              child.file.host=xxx,child.file.port=xxx,\
>> +              child.file.driver=nbd,child.ignore-errors=on
> 
> My comments earlier in this series mean this step should be two QMP
> commands: the first is blockdev-add to create an unassociated BDS, the
> second to then add that BDS into the quorum.
> 
>> +  Note:
>> +  1. There should be only one NBD Client for each primary disk.
>> +  2. host is the secondary physical machine's hostname or IP
>> +  3. Each disk must have its own export name.
>> +  4. It is all a single argument to -drive and child_add, and you should
>> +     ignore the leading whitespace.
>> +  5. The qmp command line must be run after running qmp command line in
>> +     secondary qemu.
>> +
>> +Secondary:
>> +  -drive if=none,driver=raw,file=1.raw,id=colo1 \
>> +  -drive if=xxx,driver=replication,mode=secondary,\
>> +         file.file.filename=active_disk.qcow2,\
>> +         file.driver=qcow2,\
>> +         file.backing.file.filename=hidden_disk.qcow2,\
>> +         file.backing.driver=qcow2,\
>> +         file.backing.allow-write-backing-file=on,\
>> +         file.backing.backing.backing_reference=colo1\
>> +
>> +  Then run qmp command in secondary qemu:
>> +    nbd-server-start host:port
>> +    nbd-server-add -w colo1
>> +
>> +  Note:
>> +  1. The export name in secondary QEMU command line is the secondary
>> +     disk's id.
>> +  2. The export name for the same disk must be the same
>> +  3. The qmp command nbd-server-start and nbd-server-add must be run
>> +     before running the qmp command migrate on primary QEMU
>> +  4. Don't use nbd-server-start's other options
>> +  5. Active disk, hidden disk and nbd target's length should be the
>> +     same.
>> +  6. It is better to put active disk and hidden disk in ramdisk.
>> +  7. It is all a single argument to -drive, and you should ignore
>> +     the leading whitespace.
> 
> Missing: document the steps taken during failover (that is, how do I
> promote a Secondary into a new Primary, and then attach a new Secondary
> to that point).  In particular, I suspect there may be differences

Continuous block replication is in the TODO list. But I think it is very
easy to implement it if the quorum's child can be hot-added/removed.

> between whether you want to roll back to the state of the last
> checkpoint (in hidden_disk) or just go with the current state of the

For periodic checkpoint, the secondary vm is not running, so just commit
hidden_disk to secondary disk.
For COLO, the secondary vm is running, and we need this state, so just commit
active disk to secondary disk(hidden_disk is also committed).

In which case, do we need to drop secondary disk and commit hidden disk?

Thanks
Wen Congyang

> Secondary (in Active); either way, it probably involves doing an active
> commit of the state you want into Secondary, then the formation of a new
> quorum to start handing replication data off through a new NBD client
> connection.
> 

  reply	other threads:[~2015-09-09  8:22 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-02  8:51 [Qemu-devel] [PATCH 00/16] Block replication for continuous checkpoints Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 01/16] introduce a new API to enable/disable attach device model Wen Congyang
2015-09-02 15:37   ` Eric Blake
2015-09-07  1:27     ` Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 02/16] introduce a new API to check if blk is attached Wen Congyang
2015-09-02 15:40   ` Eric Blake
2015-09-02  8:51 ` [Qemu-devel] [PATCH 03/16] allow writing to the backing file Wen Congyang
2015-09-02 16:06   ` Eric Blake
2015-09-09  9:19     ` Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 04/16] block: Allow references for backing files Wen Congyang
2015-09-02 18:50   ` Eric Blake
2015-09-09  8:51     ` Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 05/16] introduce a new API qemu_opts_absorb_qdict_by_index() Wen Congyang
2015-09-02 19:01   ` Eric Blake
2015-09-07  2:18     ` Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 06/16] quorum: allow ignoring child errors Wen Congyang
2015-09-02 16:30   ` Eric Blake
2015-09-07  3:40     ` Wen Congyang
2015-09-07 16:56     ` Dr. David Alan Gilbert
2015-09-08  0:46       ` Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 07/16] Backup: clear all bitmap when doing block checkpoint Wen Congyang
2015-09-02 14:10   ` Jeff Cody
2015-09-02  8:51 ` [Qemu-devel] [PATCH 08/16] block: make bdrv_put_ref_bh_schedule() as a public API Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 09/16] Allow creating backup jobs when opening BDS Wen Congyang
2015-09-02 14:12   ` Jeff Cody
2015-09-02  8:51 ` [Qemu-devel] [PATCH 10/16] docs: block replication's description Wen Congyang
2015-09-02 20:41   ` Eric Blake
2015-09-09  8:22     ` Wen Congyang [this message]
2015-09-02  8:51 ` [Qemu-devel] [PATCH 11/16] Add new block driver interfaces to control block replication Wen Congyang
2015-09-02 16:33   ` Eric Blake
2015-09-09  9:24     ` Wen Congyang
2015-09-25  6:14     ` Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 12/16] skip nbd_target when starting " Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 13/16] quorum: implement block driver interfaces for " Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 14/16] Implement new driver " Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 15/16] support replication driver in blockdev-add Wen Congyang
2015-09-02 16:36   ` Eric Blake
2015-09-09  8:27     ` Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 16/16] Add a new API to start/stop replication, do checkpoint to all BDSes Wen Congyang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55EFEC2E.50608@cn.fujitsu.com \
    --to=wency@cn.fujitsu.com \
    --cc=arei.gonglei@huawei.com \
    --cc=dgilbert@redhat.com \
    --cc=eblake@redhat.com \
    --cc=eddie.dong@intel.com \
    --cc=famz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=mrhines@linux.vnet.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=yanghy@cn.fujitsu.com \
    --cc=yunhong.jiang@intel.com \
    --cc=zhang.zhanghailiang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.