From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=42111 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OszyP-0002em-7Z for qemu-devel@nongnu.org; Tue, 07 Sep 2010 11:20:39 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OszyN-0002O0-9u for qemu-devel@nongnu.org; Tue, 07 Sep 2010 11:20:36 -0400 Received: from mx1.redhat.com ([209.132.183.28]:34410) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OszyM-0002Np-VH for qemu-devel@nongnu.org; Tue, 07 Sep 2010 11:20:35 -0400 Message-ID: <4C86584C.9070406@redhat.com> Date: Tue, 07 Sep 2010 17:20:44 +0200 From: Kevin Wolf MIME-Version: 1.0 Subject: Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration References: <4C864118.7070206@linux.vnet.ibm.com> <4C864D65.6090004@redhat.com> <4C86510E.9010303@linux.vnet.ibm.com> <4C8653E9.2070905@redhat.com> <4C86560D.9030308@linux.vnet.ibm.com> In-Reply-To: <4C86560D.9030308@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: "libvir-list@redhat.com" , qemu-devel , Stefan Hajnoczi Am 07.09.2010 17:11, schrieb Anthony Liguori: > On 09/07/2010 10:02 AM, Kevin Wolf wrote: >> Am 07.09.2010 16:49, schrieb Anthony Liguori: >> >>>> Shouldn't it be a runtime option? You can use the very same image with >>>> copy-on-read or copy-on-write and it will behave the same (execpt for >>>> performance), so it's not an inherent feature of the image file. >>>> >>>> >>> The way it's implemented in QED is that it's a compatible feature. This >>> means that implementations are allowed to ignore it if they want to. >>> It's really a suggestion. >>> >> Well, the point is that I see no reason why an image should contain this >> suggestion. There's really nothing about an image that could reasonably >> indicate "use this better with copy-on-read than with copy-on-write". >> >> It's a decision you make when using the image. >> > > Copy-on-read is, in many cases, a property of the backing file because > it suggests that the backing file is either very slow or potentially > volatile. The simple copy-on-read without actively streaming the rest of the image is not enough anyway for volatile backing files. > IOW, let's say I'm an image distributor and I want to provide my images > in a QED format that actually streams the image from an http server. I > could provide a QED file without a copy-on-read bit set but I'd really > like to convey this information as part of the image. > > You can argue that I should provide a config file too that contained the > copy-on-read flag set but you could make the same argument about backing > files too. No. The image is perfectly readable when using COW instead of COR. On the other hand, it's completely meaningless without its backing file. >>> So yes, you could have a run time switch that overrides the feature bit >>> on disk and either forces copy-on-read on or off. >>> >>> Do we have a way to pass block drivers run time options? >>> >> We'll get them with -blockdev. Today we're using colons for format >> specific and separate -drive options for generic things. >> > > That's right. I think I'd rather wait for -blockdev. Well, then I consider -blockdev a dependency of QED (the copy-on-read part at least) and we can't merge it before we have -blockdev. >>> You need to understand the cluster boundaries in order to optimize the >>> metadata updates. Sure, you can expose interfaces to the block layer to >>> give all of this info but that's solving the same problem for doing >>> block level copy-on-write. >>> >>> The other challenge is that for copy-on-read to be efficiently, you >>> really need a format that can distinguish between unallocated sectors >>> and zero sectors and do zero detection during the copy-on-read >>> operation. Otherwise, if you have a 10G virtual disk with a backing >>> file that's 1GB is size, copy-on-read will result in the leaf being 10G >>> instead of ~1GB. >>> >> That's a good point. But it's not a reason to make the interface >> specific to QED just because other formats would probably not implement >> it as efficiently. > > You really can't do as good of a job in the block layer because you have > very little info about the characteristics of the disk image. I'm not saying that the generic block layer should implement copy-on-read. I just think that it should pass a run-time option to the driver - maybe just a BDRV_O_COPY_ON_READ flag - instead of having the information in the image file. From a user perspective it should look the same for qed, qcow2 and whatever else (like copy-on-write today) Kevin