From: Avi Kivity <avi@redhat.com>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: Kevin Wolf <kwolf@redhat.com>,
Stefan Hajnoczi <stefanha@gmail.com>,
qemu-devel <qemu-devel@nongnu.org>,
"libvir-list@redhat.com" <libvir-list@redhat.com>,
Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Subject: Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration
Date: Sun, 12 Sep 2010 19:31:07 +0200 [thread overview]
Message-ID: <4C8D0E5B.7010106@redhat.com> (raw)
In-Reply-To: <4C8D0BA3.7050706@codemonkey.ws>
On 09/12/2010 07:19 PM, Anthony Liguori wrote:
> On 09/12/2010 11:45 AM, Avi Kivity wrote:
>>> Streaming relies on copy-on-read to do the writing.
>>
>>
>> Ah. You can avoid the copy-on-read implementation in the block
>> format driver and do it completely in generic code.
>
> Copy on read takes advantage of temporal locality. You wouldn't want
> to stream without copy on read because you decrease your idle I/O time
> by not effectively caching.
I meant, implement copy-on-read in generic code side by side with
streaming. Streaming becomes just a prefetch operation (read and
discard) which lets copy-on-read do the rest. This is essentially your
implementation, yes?
>
>>>> stream_4():
>>>> increment offset
>>>> if more:
>>>> bdrv_aio_stream()
>>>>
>>>>
>>>> Of course, need to serialize wrt guest writes, which adds a bit
>>>> more complexity. I'll leave it to you to code the state machine
>>>> for that.
>>>
>>> http://repo.or.cz/w/qemu/aliguori.git/commitdiff/d44ea43be084cc879cd1a33e1a04a105f4cb7637?hp=34ed425e7dd39c511bc247d1ab900e19b8c74a5d
>>>
>>
>> Clever - it pushes all the synchronization into the copy-on-read
>> implementation. But the serialization there hardly jumps out of the
>> code.
>>
>> Do I understand correctly that you can only have one allocating read
>> or write running?
>
> Cluster allocation, L2 cache allocation, or on-disk L2 allocation?
>
> You only have one on-disk L2 allocation at one time. That's just an
> implementation detail at the moment. An on-disk L2 allocation happens
> only when writing to a new cluster that requires a totally new L2
> entry. Since L2s cover 2GB of logical space, it's a rare event so
> this turns out to be pretty reasonable for a first implementation.
>
> Parallel on-disk L2 allocations is not that difficult, it's just a
> future TODO.
Really, you can just preallocate all L2s. Most filesystems will touch
all of them very soon. qcow2 might save some space for snapshots which
share L2s (doubtful) or for 4k clusters (historical) but for qed with
64k clusters, it doesn't save any space.
Linear L2s will also make your fsck *much* quicker. Size is .01% of
logical image size. 1MB for a 10GB guest, by the time you install
something on it that's a drop in the bucket.
If you install a guest on a 100GB disk, what percentage of L2s are
allocated?
>
>>>
>>> Generally, I think the block layer makes more sense if the interface
>>> to the formats are high level and code sharing is achieved not by
>>> mandating a world view but rather but making libraries of common
>>> functionality. This is more akin to how the FS layer works in Linux.
>>>
>>> So IMHO, we ought to add a bdrv_aio_commit function, turn the
>>> current code into a generic_aio_commit, implement a qed_aio_commit,
>>> then somehow do qcow2_aio_commit, and look at what we can refactor
>>> into common code.
>>
>> What Linux does if have an equivalent of bdrv_generic_aio_commit()
>> which most implementations call (or default to), and only do
>> something if they want something special. Something like commit (or
>> copy-on-read, or copy-on-write, or streaming) can be implement 100%
>> in terms of the generic functions (and indeed qcow2 backing files can
>> be any format).
>
> Yes, what I'm really saying is that we should take the
> bdrv_generic_aio_commit() approach. I think we're in agreement here.
>
Strange feeling.
--
error compiling committee.c: too many arguments to function
next prev parent reply other threads:[~2010-09-12 17:31 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-07 13:41 [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration Anthony Liguori
2010-09-07 14:01 ` Alexander Graf
2010-09-07 14:31 ` Anthony Liguori
2010-09-07 14:33 ` Stefan Hajnoczi
2010-09-07 14:51 ` Anthony Liguori
2010-09-07 14:55 ` Stefan Hajnoczi
2010-09-07 15:00 ` Anthony Liguori
2010-09-07 15:09 ` Stefan Hajnoczi
2010-09-07 15:20 ` Anthony Liguori
2010-09-08 8:26 ` Kevin Wolf
2010-09-07 14:34 ` Kevin Wolf
2010-09-07 14:49 ` Stefan Hajnoczi
2010-09-07 14:57 ` Anthony Liguori
2010-09-07 15:05 ` Stefan Hajnoczi
2010-09-07 15:23 ` Anthony Liguori
2010-09-12 12:41 ` Avi Kivity
2010-09-12 13:25 ` Anthony Liguori
2010-09-12 13:40 ` Avi Kivity
2010-09-12 15:23 ` Anthony Liguori
2010-09-12 16:45 ` Avi Kivity
2010-09-12 17:19 ` Anthony Liguori
2010-09-12 17:31 ` Avi Kivity [this message]
2010-09-07 14:49 ` Anthony Liguori
2010-09-07 15:02 ` Kevin Wolf
2010-09-07 15:11 ` Anthony Liguori
2010-09-07 15:20 ` Kevin Wolf
2010-09-07 15:30 ` Anthony Liguori
2010-09-07 15:39 ` Kevin Wolf
2010-09-07 16:00 ` Anthony Liguori
2010-09-07 15:03 ` [Qemu-devel] " Daniel P. Berrange
2010-09-07 15:16 ` Anthony Liguori
2010-09-12 10:55 ` [Qemu-devel] " Avi Kivity
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4C8D0E5B.7010106@redhat.com \
--to=avi@redhat.com \
--cc=anthony@codemonkey.ws \
--cc=kwolf@redhat.com \
--cc=libvir-list@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@gmail.com \
--cc=stefanha@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).