From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60821) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YGMIZ-0007iz-7A for qemu-devel@nongnu.org; Wed, 28 Jan 2015 01:40:24 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YGMIU-0007Bl-CE for qemu-devel@nongnu.org; Wed, 28 Jan 2015 01:40:23 -0500 Received: from [59.151.112.132] (port=41856 helo=heian.cn.fujitsu.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YGMIT-0007BD-Uo for qemu-devel@nongnu.org; Wed, 28 Jan 2015 01:40:18 -0500 Message-ID: <54C884BF.4000707@cn.fujitsu.com> Date: Wed, 28 Jan 2015 14:42:07 +0800 From: Wen Congyang MIME-Version: 1.0 References: <1419564708-17714-1-git-send-email-yanghy@cn.fujitsu.com> <549ECEF2.7080302@redhat.com> In-Reply-To: <549ECEF2.7080302@redhat.com> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH RESEND 0/2] PoC: Block replication for continuous checkpointing List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini , Yang Hongyang , qemu-devel@nongnu.org Cc: kwolf@redhat.com, quintela@redhat.com, GuiJianfeng@cn.fujitsu.com, yunhong.jiang@intel.com, eddie.dong@intel.com, dgilbert@redhat.com, mrhines@linux.vnet.ibm.com, stefanha@redhat.com, Amit Shah , walid.nouri@gmail.com On 12/27/2014 11:23 PM, Paolo Bonzini wrote: > > > On 26/12/2014 04:31, Yang Hongyang wrote: >> Please feel free to comment. >> We want comments/feedbacks as many as possiable please, thanks in advance. > > Hi Yang, > > I think it's possible to build COLO block replication from many basic > blocks that are already in QEMU. The only new piece would be the disk > buffer on the secondary. > > virtio-blk || > ^ || .---------- > | || | Secondary > 1 Quorum || '---------- > / \ || > / \ || > Primary 2 NBD -------> 2 NBD > disk client || server virtio-blk > || ^ ^ > --------. || | | > Primary | || Secondary disk <--------- COLO buffer 3 > --------' || backing > > > 1) The disk on the primary is represented by a block device with two > children, providing replication between a primary disk and the host that > runs the secondary VM. The read pattern patches for quorum > (http://lists.gnu.org/archive/html/qemu-devel/2014-08/msg02381.html) can > be used/extended to make the primary always read from the local disk > instead of going through NBD. > > 2) The secondary disk receives writes from the primary VM through QEMU's > embedded NBD server (speculative write-through). > > 3) The disk on the secondary is represented by a custom block device > ("COLO buffer"). The disk buffer's backing image is the secondary disk, > and the disk buffer uses bdrv_add_before_write_notifier to implement > copy-on-write, similar to block/backup.c. > > 4) Checkpointing can use new bdrv_prepare_checkpoint and > bdrv_do_checkpoint members in BlockDriver to discard the COLO buffer, > similar to your patches (you did not explain why you do checkpointing in > two steps). Failover instead is done with bdrv_commit or can even be > done without stopping the secondary (live commit, block/commit.c). > > > The missing parts are: > > 1) NBD server on the backing image of the COLO buffer. This means the > backing image needs its own BlockBackend. Apart for this, no new > infrastructure is needed to receive writes on the secondary. Backing image is always opened read-only. How to remove this limitaion? Add a option to control it? Thanks Wen Congyang > > 2) Read pattern support for quorum need to be extended for the needs of > the COLO primary. It may be simpler or faster to write a simple > "replication" driver that writes to N children but always reads from the > first. But in any case initial tests can be done with the quorum > driver, even without read pattern support. Again, all the network > infrastructure to replicate writes already exists in QEMU. > > 3) Of course the disk buffer itself. > > Paolo > >> Thanks, >> Yang. >> >> Wen Congyang (1): >> PoC: Block replication for COLO >> >> Yang Hongyang (1): >> Block: Block replication design for COLO >> >> block.c | 48 +++++++ >> block/blkcolo.c | 338 ++++++++++++++++++++++++++++++++++++++++++++++ >> docs/blkcolo.txt | 85 ++++++++++++ >> include/block/block.h | 6 + >> include/block/block_int.h | 21 +++ >> 5 files changed, 498 insertions(+) >> create mode 100644 block/blkcolo.c >> create mode 100644 docs/blkcolo.txt >> > > . >