From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41065) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aFy2b-0006Gf-67 for qemu-devel@nongnu.org; Mon, 04 Jan 2016 00:50:50 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aFy2a-0006fs-7X for qemu-devel@nongnu.org; Mon, 04 Jan 2016 00:50:49 -0500 References: <1449034311-4094-1-git-send-email-wency@cn.fujitsu.com> <565E8395.9050004@cn.fujitsu.com> <20151223094759.GB11394@stefanha-x1.localdomain> From: Wen Congyang Message-ID: <568A0830.2020301@cn.fujitsu.com> Date: Mon, 4 Jan 2016 13:50:40 +0800 MIME-Version: 1.0 In-Reply-To: <20151223094759.GB11394@stefanha-x1.localdomain> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [Patch v12 resend 08/10] Implement new driver for block replication List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: Kevin Wolf , Fam Zheng , zhanghailiang , qemu block , Jiang Yunhong , Dong Eddie , qemu devel , "Michael R. Hines" , "Dr. David Alan Gilbert" , Gonglei , Paolo Bonzini , Max Reitz On 12/23/2015 05:47 PM, Stefan Hajnoczi wrote: > On Wed, Dec 02, 2015 at 01:37:25PM +0800, Wen Congyang wrote: >> + /* >> + * Only write to active disk if the sectors have >> + * already been allocated in active disk/hidden disk. >> + */ >> + qemu_iovec_init(&hd_qiov, qiov->niov); >> + while (remaining_sectors > 0) { >> + ret = bdrv_is_allocated_above(top, base, sector_num, >> + remaining_sectors, &n); > > There is a race condition here since multiple I/O requests can be in > flight at the same time. If two requests touch the same cluster > between top->base then the result of these checks could be unreliable. I don't think so. When we come here, primary qemu is gone, and failover is done. We only write to active disk if the sectors have already been allocated in active disk/hidden disk before failover. So it two requests touch the same cluster, it is OK, because the function bdrv_is_allocated_above()'s return value is not changed. > > The simple but slow solution is to use a CoMutex to serialize requests. > >> + if (ret < 0) { >> + return ret; >> + } >> + >> + qemu_iovec_reset(&hd_qiov); >> + qemu_iovec_concat(&hd_qiov, qiov, bytes_done, n * 512); >> + >> + target = ret ? top : base; >> + ret = bdrv_co_writev(target, sector_num, n, &hd_qiov); >> + if (ret < 0) { >> + return ret; >> + } >> + >> + remaining_sectors -= n; >> + sector_num += n; >> + bytes_done += n * BDRV_SECTOR_SIZE; >> + } > > I think this can be replaced with an active commit block job that copies > data down from the hidden/active disk to the secondary disk. It is okay > to keep writing to the secondary disk while the block job is running and > then switch over to the secondary disk once it completes. Yes, active commit is another choice. IIRC, I don't use it because mirror job has some problem. It is fixed now(see bdrv_drained_begin()/bdrv_drained_end() in the mirror job). We will use mirror job in the next version. > >> + >> + return 0; >> +} >> + >> +static coroutine_fn int replication_co_discard(BlockDriverState *bs, >> + int64_t sector_num, >> + int nb_sectors) >> +{ >> + BDRVReplicationState *s = bs->opaque; >> + int ret; >> + >> + ret = replication_get_io_status(s); >> + if (ret < 0) { >> + return ret; >> + } >> + >> + if (ret == 1) { >> + /* It is secondary qemu and we are after failover */ >> + ret = bdrv_co_discard(s->secondary_disk, sector_num, nb_sectors); > > What if the clusters are still allocated in the hidden/active disk? > What does discard do? Drop the data that allocated in the disk? If so, I think I make a misunderstand. I will fix it in the next version. Thanks Wen Congyang