From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=42111 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1OszyP-0002em-7Z
	for qemu-devel@nongnu.org; Tue, 07 Sep 2010 11:20:39 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69)
	(envelope-from <kwolf@redhat.com>) id 1OszyN-0002O0-9u
	for qemu-devel@nongnu.org; Tue, 07 Sep 2010 11:20:36 -0400
Received: from mx1.redhat.com ([209.132.183.28]:34410)
	by eggs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <kwolf@redhat.com>) id 1OszyM-0002Np-VH
	for qemu-devel@nongnu.org; Tue, 07 Sep 2010 11:20:35 -0400
Message-ID: <4C86584C.9070406@redhat.com>
Date: Tue, 07 Sep 2010 17:20:44 +0200
From: Kevin Wolf <kwolf@redhat.com>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy
	block migration
References: <4C864118.7070206@linux.vnet.ibm.com> <4C864D65.6090004@redhat.com>
	<4C86510E.9010303@linux.vnet.ibm.com> <4C8653E9.2070905@redhat.com>
	<4C86560D.9030308@linux.vnet.ibm.com>
In-Reply-To: <4C86560D.9030308@linux.vnet.ibm.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Anthony Liguori <aliguori@linux.vnet.ibm.com>
Cc: "libvir-list@redhat.com" <libvir-list@redhat.com>, qemu-devel <qemu-devel@nongnu.org>, Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>

Am 07.09.2010 17:11, schrieb Anthony Liguori:
> On 09/07/2010 10:02 AM, Kevin Wolf wrote:
>> Am 07.09.2010 16:49, schrieb Anthony Liguori:
>>    
>>>> Shouldn't it be a runtime option? You can use the very same image with
>>>> copy-on-read or copy-on-write and it will behave the same (execpt for
>>>> performance), so it's not an inherent feature of the image file.
>>>>
>>>>        
>>> The way it's implemented in QED is that it's a compatible feature.  This
>>> means that implementations are allowed to ignore it if they want to.
>>> It's really a suggestion.
>>>      
>> Well, the point is that I see no reason why an image should contain this
>> suggestion. There's really nothing about an image that could reasonably
>> indicate "use this better with copy-on-read than with copy-on-write".
>>
>> It's a decision you make when using the image.
>>    
> 
> Copy-on-read is, in many cases, a property of the backing file because 
> it suggests that the backing file is either very slow or potentially 
> volatile.

The simple copy-on-read without actively streaming the rest of the image
is not enough anyway for volatile backing files.

> IOW, let's say I'm an image distributor and I want to provide my images 
> in a QED format that actually streams the image from an http server.  I 
> could provide a QED file without a copy-on-read bit set but I'd really 
> like to convey this information as part of the image.
> 
> You can argue that I should provide a config file too that contained the 
> copy-on-read flag set but you could make the same argument about backing 
> files too.

No. The image is perfectly readable when using COW instead of COR. On
the other hand, it's completely meaningless without its backing file.

>>> So yes, you could have a run time switch that overrides the feature bit
>>> on disk and either forces copy-on-read on or off.
>>>
>>> Do we have a way to pass block drivers run time options?
>>>      
>> We'll get them with -blockdev. Today we're using colons for format
>> specific and separate -drive options for generic things.
>>    
> 
> That's right.  I think I'd rather wait for -blockdev.

Well, then I consider -blockdev a dependency of QED (the copy-on-read
part at least) and we can't merge it before we have -blockdev.

>>> You need to understand the cluster boundaries in order to optimize the
>>> metadata updates.  Sure, you can expose interfaces to the block layer to
>>> give all of this info but that's solving the same problem for doing
>>> block level copy-on-write.
>>>
>>> The other challenge is that for copy-on-read to be efficiently, you
>>> really need a format that can distinguish between unallocated sectors
>>> and zero sectors and do zero detection during the copy-on-read
>>> operation.  Otherwise, if you have a 10G virtual disk with a backing
>>> file that's 1GB is size, copy-on-read will result in the leaf being 10G
>>> instead of ~1GB.
>>>      
>> That's a good point. But it's not a reason to make the interface
>> specific to QED just because other formats would probably not implement
>> it as efficiently.
> 
> You really can't do as good of a job in the block layer because you have 
> very little info about the characteristics of the disk image.

I'm not saying that the generic block layer should implement
copy-on-read. I just think that it should pass a run-time option to the
driver - maybe just a BDRV_O_COPY_ON_READ flag - instead of having the
information in the image file. From a user perspective it should look
the same for qed, qcow2 and whatever else (like copy-on-write today)

Kevin