qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Avi Kivity <avi@redhat.com>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: Kevin Wolf <kwolf@redhat.com>,
	Stefan Hajnoczi <stefanha@gmail.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	"libvir-list@redhat.com" <libvir-list@redhat.com>,
	Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Subject: Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration
Date: Sun, 12 Sep 2010 18:45:08 +0200	[thread overview]
Message-ID: <4C8D0394.6010605@redhat.com> (raw)
In-Reply-To: <4C8CF07C.5040509@codemonkey.ws>

  On 09/12/2010 05:23 PM, Anthony Liguori wrote:
> On 09/12/2010 08:40 AM, Avi Kivity wrote:
>> Why would it serialize all I/O operations?  It's just like another 
>> vcpu issuing reads.
>
> Because the block layer isn't re-entrant.

A threaded block layer is reentrant.  Of course pushing the thing into a 
thread requires that.

>
>>> What you basically do is:
>>>
>>> stream_step_three():
>>>    complete()
>>>
>>> stream_step_two(offset, length):
>>>    bdrv_aio_readv(offset, length, buffer, stream_step_three)
>>>
>>> bdrv_aio_stream():
>>>     bdrv_aio_find_free_cluster(stream_step_two)
>>
>> Isn't there a write() missing somewhere?
>
> Streaming relies on copy-on-read to do the writing.

Ah.  You can avoid the copy-on-read implementation in the block format 
driver and do it completely in generic code.

>
>>>
>>> And that's exactly what the current code looks like.  The only 
>>> change to the patch that this does is make some of qed's internals 
>>> be block layer interfaces.
>>
>> Why do you need find_free_cluster()?  That's a physical offset 
>> thing.  Just write to the same logical offset.
>>
>> IOW:
>>
>>     bdrv_aio_stream():
>>         bdrv_aio_read(offset, stream_2)
>
> It's an optimization.  If you've got a fully missing L1 entry, then 
> you're going to memset() 2GB worth of zeros.  That's just wasted 
> work.  With a 1TB image with a 1GB allocation, it's a huge amount of 
> wasted work.

Ok.  And it's a logical offset, not physical as I thought, which 
confused me.

>
>>     stream_2():
>>         if all zeros:
>>             increment offset
>>             if more:
>>                 bdrv_aio_stream()
>>        bdrv_aio_write(offset, stream_3)
>>
>>     stream_3():
>>         bdrv_aio_write(offset, stream_4)
>
> I don't understand why stream_3() is needed.

This implementation doesn't rely on copy-on-read code in the block 
format driver.  It is generic and uses existing block layer interfaces.  
It would need copy-on-read support in the generic block layer as well.

>
>>     stream_4():
>>         increment offset
>>         if more:
>>              bdrv_aio_stream()
>>
>>
>> Of course, need to serialize wrt guest writes, which adds a bit more 
>> complexity.  I'll leave it to you to code the state machine for that.
>
> http://repo.or.cz/w/qemu/aliguori.git/commitdiff/d44ea43be084cc879cd1a33e1a04a105f4cb7637?hp=34ed425e7dd39c511bc247d1ab900e19b8c74a5d 
>

Clever - it pushes all the synchronization into the copy-on-read 
implementation.  But the serialization there hardly jumps out of the code.

Do I understand correctly that you can only have one allocating read or 
write running?

>> Parts of it are: commit.  Of course, that's horribly synchronous.
>
> If you've got AIO internally, making commit work is pretty easy.  
> Doing asynchronous commit at a generic layer is not easy though unless 
> you expose lots of details.

I don't see why.  Commit is a simple loop that copies all clusters.  All 
it needs to know is if a cluster is allocated or not.

When commit is running you need additional serialization against guest 
writes, and to direct guest writes and reads to the committed region to 
the backing file instead of the temporary image.  But the block layer 
already knows of all guest writes.

>
> Generally, I think the block layer makes more sense if the interface 
> to the formats are high level and code sharing is achieved not by 
> mandating a world view but rather but making libraries of common 
> functionality.   This is more akin to how the FS layer works in Linux.
>
> So IMHO, we ought to add a bdrv_aio_commit function, turn the current 
> code into a generic_aio_commit, implement a qed_aio_commit, then 
> somehow do qcow2_aio_commit, and look at what we can refactor into 
> common code.

What Linux does if have an equivalent of bdrv_generic_aio_commit() which 
most implementations call (or default to), and only do something if they 
want something special.  Something like commit (or copy-on-read, or 
copy-on-write, or streaming) can be implement 100% in terms of the 
generic functions (and indeed qcow2 backing files can be any format).

-- 
error compiling committee.c: too many arguments to function

  reply	other threads:[~2010-09-12 16:45 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-07 13:41 [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration Anthony Liguori
2010-09-07 14:01 ` Alexander Graf
2010-09-07 14:31   ` Anthony Liguori
2010-09-07 14:33 ` Stefan Hajnoczi
2010-09-07 14:51   ` Anthony Liguori
2010-09-07 14:55     ` Stefan Hajnoczi
2010-09-07 15:00       ` Anthony Liguori
2010-09-07 15:09         ` Stefan Hajnoczi
2010-09-07 15:20           ` Anthony Liguori
2010-09-08  8:26           ` Kevin Wolf
2010-09-07 14:34 ` Kevin Wolf
2010-09-07 14:49   ` Stefan Hajnoczi
2010-09-07 14:57     ` Anthony Liguori
2010-09-07 15:05       ` Stefan Hajnoczi
2010-09-07 15:23         ` Anthony Liguori
2010-09-12 12:41       ` Avi Kivity
2010-09-12 13:25         ` Anthony Liguori
2010-09-12 13:40           ` Avi Kivity
2010-09-12 15:23             ` Anthony Liguori
2010-09-12 16:45               ` Avi Kivity [this message]
2010-09-12 17:19                 ` Anthony Liguori
2010-09-12 17:31                   ` Avi Kivity
2010-09-07 14:49   ` Anthony Liguori
2010-09-07 15:02     ` Kevin Wolf
2010-09-07 15:11       ` Anthony Liguori
2010-09-07 15:20         ` Kevin Wolf
2010-09-07 15:30           ` Anthony Liguori
2010-09-07 15:39             ` Kevin Wolf
2010-09-07 16:00               ` Anthony Liguori
2010-09-07 15:03 ` [Qemu-devel] " Daniel P. Berrange
2010-09-07 15:16   ` Anthony Liguori
2010-09-12 10:55 ` [Qemu-devel] " Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C8D0394.6010605@redhat.com \
    --to=avi@redhat.com \
    --cc=anthony@codemonkey.ws \
    --cc=kwolf@redhat.com \
    --cc=libvir-list@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    --cc=stefanha@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).