From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=55458 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1OuqDI-0004Rl-8R
	for qemu-devel@nongnu.org; Sun, 12 Sep 2010 13:19:38 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69)
	(envelope-from <anthony@codemonkey.ws>) id 1OuqDG-00058N-BT
	for qemu-devel@nongnu.org; Sun, 12 Sep 2010 13:19:36 -0400
Received: from mail-gy0-f173.google.com ([209.85.160.173]:56676)
	by eggs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <anthony@codemonkey.ws>) id 1OuqDG-00058F-3F
	for qemu-devel@nongnu.org; Sun, 12 Sep 2010 13:19:34 -0400
Received: by gya1 with SMTP id 1so2163194gya.4
	for <qemu-devel@nongnu.org>; Sun, 12 Sep 2010 10:19:33 -0700 (PDT)
Message-ID: <4C8D0BA3.7050706@codemonkey.ws>
Date: Sun, 12 Sep 2010 12:19:31 -0500
From: Anthony Liguori <anthony@codemonkey.ws>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy
	block migration
References: <4C864118.7070206@linux.vnet.ibm.com>	<4C864D65.6090004@redhat.com>	<AANLkTim7UHH3r__3C_Ad3oB1rnXyRsH7bcuZw+rBQP6=@mail.gmail.com>	<4C8652CB.9060801@linux.vnet.ibm.com>
	<4C8CCA91.4060001@redhat.com> <4C8CD4DB.9020905@codemonkey.ws>
	<4C8CD847.8030804@redhat.com> <4C8CF07C.5040509@codemonkey.ws>
	<4C8D0394.6010605@redhat.com>
In-Reply-To: <4C8D0394.6010605@redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Avi Kivity <avi@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>, Stefan Hajnoczi <stefanha@gmail.com>, qemu-devel <qemu-devel@nongnu.org>, "libvir-list@redhat.com" <libvir-list@redhat.com>, Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>

On 09/12/2010 11:45 AM, Avi Kivity wrote:
>> Streaming relies on copy-on-read to do the writing.
>
>
> Ah.  You can avoid the copy-on-read implementation in the block format 
> driver and do it completely in generic code.

Copy on read takes advantage of temporal locality.  You wouldn't want to 
stream without copy on read because you decrease your idle I/O time by 
not effectively caching.

>>>     stream_4():
>>>         increment offset
>>>         if more:
>>>              bdrv_aio_stream()
>>>
>>>
>>> Of course, need to serialize wrt guest writes, which adds a bit more 
>>> complexity.  I'll leave it to you to code the state machine for that.
>>
>> http://repo.or.cz/w/qemu/aliguori.git/commitdiff/d44ea43be084cc879cd1a33e1a04a105f4cb7637?hp=34ed425e7dd39c511bc247d1ab900e19b8c74a5d 
>>
>
> Clever - it pushes all the synchronization into the copy-on-read 
> implementation.  But the serialization there hardly jumps out of the 
> code.
>
> Do I understand correctly that you can only have one allocating read 
> or write running?

Cluster allocation, L2 cache allocation, or on-disk L2 allocation?

You only have one on-disk L2 allocation at one time.  That's just an 
implementation detail at the moment.  An on-disk L2 allocation happens 
only when writing to a new cluster that requires a totally new L2 
entry.  Since L2s cover 2GB of logical space, it's a rare event so this 
turns out to be pretty reasonable for a first implementation.

Parallel on-disk L2 allocations is not that difficult, it's just a 
future TODO.

>>
>> Generally, I think the block layer makes more sense if the interface 
>> to the formats are high level and code sharing is achieved not by 
>> mandating a world view but rather but making libraries of common 
>> functionality.   This is more akin to how the FS layer works in Linux.
>>
>> So IMHO, we ought to add a bdrv_aio_commit function, turn the current 
>> code into a generic_aio_commit, implement a qed_aio_commit, then 
>> somehow do qcow2_aio_commit, and look at what we can refactor into 
>> common code.
>
> What Linux does if have an equivalent of bdrv_generic_aio_commit() 
> which most implementations call (or default to), and only do something 
> if they want something special.  Something like commit (or 
> copy-on-read, or copy-on-write, or streaming) can be implement 100% in 
> terms of the generic functions (and indeed qcow2 backing files can be 
> any format).

Yes, what I'm really saying is that we should take the 
bdrv_generic_aio_commit() approach.  I think we're in agreement here.

Regards,

Anthony Liguori