From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38050) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Un2Um-0002kQ-Mx for qemu-devel@nongnu.org; Thu, 13 Jun 2013 04:03:02 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Un2Uj-0007zS-Up for qemu-devel@nongnu.org; Thu, 13 Jun 2013 04:03:00 -0400 Received: from e23smtp02.au.ibm.com ([202.81.31.144]:42946) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Un2Uj-0007zF-Cy for qemu-devel@nongnu.org; Thu, 13 Jun 2013 04:02:57 -0400 Received: from /spool/local by e23smtp02.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 13 Jun 2013 17:53:37 +1000 Received: from d23relay05.au.ibm.com (d23relay05.au.ibm.com [9.190.235.152]) by d23dlp03.au.ibm.com (Postfix) with ESMTP id 688F03578052 for ; Thu, 13 Jun 2013 18:02:51 +1000 (EST) Received: from d23av04.au.ibm.com (d23av04.au.ibm.com [9.190.235.139]) by d23relay05.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r5D7m78h3801402 for ; Thu, 13 Jun 2013 17:48:07 +1000 Received: from d23av04.au.ibm.com (loopback [127.0.0.1]) by d23av04.au.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id r5D82ow5020327 for ; Thu, 13 Jun 2013 18:02:50 +1000 Message-ID: <51B97C7B.4030708@linux.vnet.ibm.com> Date: Thu, 13 Jun 2013 16:02:03 +0800 From: Wenchao Xia MIME-Version: 1.0 References: <1369917299-5725-1-git-send-email-stefanha@redhat.com> <1369917299-5725-4-git-send-email-stefanha@redhat.com> <20130606035618.GA24375@localhost.nay.redhat.com> <20130606080513.GA13466@stefanha-thinkpad.redhat.com> <20130606085649.GA15648@localhost.nay.redhat.com> <20130607071812.GA16953@stefanha-thinkpad.redhat.com> <51B960BA.4050801@linux.vnet.ibm.com> <51B961A4.1060608@linux.vnet.ibm.com> <20130613063340.GA16044@localhost.nay.redhat.com> In-Reply-To: <20130613063340.GA16044@localhost.nay.redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [PATCH v5 03/11] block: add basic backup support to block driver List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi , Kevin Wolf , Stefan Hajnoczi , qemu-devel@nongnu.org, imain@redhat.com, Paolo Bonzini , dietmar@proxmox.com, Fam Zheng 于 2013-6-13 14:33, Fam Zheng 写道: > On Thu, 06/13 14:07, Wenchao Xia wrote: >> 于 2013-6-13 14:03, Wenchao Xia 写道: >>> 于 2013-6-7 15:18, Stefan Hajnoczi 写道: >>>> On Thu, Jun 06, 2013 at 04:56:49PM +0800, Fam Zheng wrote: >>>>> On Thu, 06/06 10:05, Stefan Hajnoczi wrote: >>>>>> On Thu, Jun 06, 2013 at 11:56:18AM +0800, Fam Zheng wrote: >>>>>>> On Thu, 05/30 14:34, Stefan Hajnoczi wrote: >>>>>>>> + >>>>>>>> +static int coroutine_fn backup_before_write_notify( >>>>>>>> + NotifierWithReturn *notifier, >>>>>>>> + void *opaque) >>>>>>>> +{ >>>>>>>> + BdrvTrackedRequest *req = opaque; >>>>>>>> + >>>>>>>> + return backup_do_cow(req->bs, req->sector_num, >>>>>>>> req->nb_sectors, NULL); >>>>>>>> +} >>>>>>> >>>>>>> I'm wondering if we can see the logic here with a backing hd >>>>>>> relationship? req->bs is a backing file of job->target, but guest is >>>>>>> going to write to it, so we need to COW down the data to job->target >>>>>>> before overwritting (i.e. cluster is not allocated in child). >>>>>>> >>>>>>> I think if we do this in block layer, there's not much necessity for a >>>>>>> before-write notifier here (although it may be useful for other >>>>>>> cases): >>>>>>> >>>>>>> in bdrv_write: >>>>>>> for child in req->bs->open_children >>>>>>> if not child->is_allocated(req->sectors) >>>>>>> do COW to child >>>>>>> >>>>>>> The advantage of this is that we won't need to start block-backup >>>>>>> job in >>>>>>> sync mode "none" to do point-in-time snapshot (image fleecing), and we >>>>>>> get writable snapshot (possibility to open backing file writable and >>>>>>> write to it safely) as a by-product. >>>>>>> >>>>>>> But we will need to keep track of parent<->child of block states, >>>>>>> and we >>>>>>> still need to take care of overlapping writing between block job and >>>>>>> guest request. >>>>>> >>>>>> There's one catch here: bs->target may not support backing files, it >>>>>> can >>>>>> be a raw file, for example. We'll only use backing files for >>>>>> point-in-time snapshots but other use cases might not. raw doesn't >>>>>> really implement is_allocated(), so the whole concept would have to >>>>>> change a little: >>>>> >>>>> Another use case may be parent modification. Suppose we have >>>>> >>>>> ,--- child1.qcow2 >>>>> parent.qcow2 < >>>>> `--- child2.qcow2 >>>>> >>>>> We can use parent.qcow2 as block device in QEMU without breaking >>>>> child1.qcow2 or child2.qcow2 by telling QEMU who its children are: >>>>> >>>>> $QEMU -drive file=parent.qcow2,children=child1.qcow2:child2.qcow2 >>>>> >>>>> Then we open the three images and setup parent_bs->open_children, the >>>>> children are protected from being corrupted. >>>>> >>>>>> >>>>>> bs->open_children becomes independent of backing files - any >>>>>> BlockDriverState can be added to this list. ->is_allocated() basically >>>>>> becomes the bitmap that we keep in the block job. >>>>> >>>>> Yes. But it is possible to keep a bitmap for raw (and those don't >>>>> implement is_allocated()) in block layer too, or in overlay: could >>>>> add-cow by Dongxu Wang help here? >>>> >>>> Yes absolutely. >>>> >>>> Stefan >>>> >>> One advantage of external backup, or backing up chain, is that it >>> holds 'Delta' data only and is small enough. If it is changed toward a >>> 'full' data writable snapshot, it become bigger. With backup chain >>> qemu-img can restore/clone a writable and usable one, So I don't >>> think adding that in qemu emulator helps much, and it will make things >>> more complicit.... user won't care who is doing the job, qemu or >>> qemu-img. >>> >> I mean that "get writable snapshot (possibility to open backing file >> writable and write to it safely) as a by-product." in this series, is >> not very valuable. >> > > I'm not selling writable snapshot, my point was just that semantic of > block-backup, getting a point-in-time snapshot, inherently works like a > backing chain but writting to parent (guest drive) will not break its > children (our thin PIT snapshot). If we see it this way, COW is not so > specific to a block job like block-backup, it can be generic in the > backing chain logic. > OK, similar thing happens in drive-mirror, if you treat it as backing chain. To do it, following assumption need to be removed: 1 top *bs is the active one. 2 guest write request goes only to top *bs. 3 *bs->backing_hd do not care about *bs(also a hidden change: maybe image should remember its child as well as parent.) Actually it will change the chain relationship from one direction of top-down into both direction. I think a separate series is needed to do that. > Though, the value in a writable snapshot is that we can actually > _modify_ a backing image in place, rather than forking the chain to I can't see a good user case requiring modifying backing image in place, to me snapshot means a read only one with time stamp in the past. Personally I think forbid modification of of snapshot is correct, in construction there should be pre-defined concepts to avoid chaos in architecture. > write to the new child. This is not supported with qemu or qemu-img now, > once you create a child with the image as backing file, you mustn't > modify it. > -- Best Regards Wenchao Xia